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Abstract 

Suppose the signal x £ K" is realized by driving a fc-sparsc signal u G R" through an arbitrary 
unknown stable discrete-linear time invariant system H, namely, x(t) = (h * u)(t), where h(-) is 
the impulse response of the operator H. These types of processes arise naturally in Reflection 
Seismology. In this paper we are interested in several problems: (a) Blind-Dcconvolution: Can 
we recover both the filter H and the sparse signal u from noisy measurements? (b) Compressive 
Sensing: Is x(-) compressible in the conventional sense of compressed sensing? Namely, can x(t) 
be reconstructed from a sparse set of measurements. We develop novel i\ minimization methods 
to solve both cases and establish sufficient conditions for exact recovery for the case when the 
unknown system H is auto-regressive (i.e. all pole) of a known order. In the compressed 
sensing/sampling setting it turns out that both H and x can be reconstructed from 0(k log(n)) 
measurements under certain technical conditions on the support structure of u. Our main 
idea is to pass x through a linear time invariant system G and collect 0(fclog(n,)) sequential 
measurements. The filter G is chosen suitably, namely, its associated Toeplitz matrix satisfies 
the RIP property. We develop a novel LP optimization algorithm and show that both the 
unknown filter H and the sparse input u can be reliably estimated. 

1 Introduction 

In this paper we focus on blind de-convolution problems for filtered sparse processes. Specifically, 
a sparse input u(t) is filtered by an unknown infinite impulse response (IIR) discrete time stable 
linear filter H and the resulting output 

x(t) = (Hu)(t) = ^2u(n)h(t - n) 

i 

is measured in Gaussian noise, namely, y(t) = x(t) + n(t) for t = 0, 1, ... ,N. The goal is to detect 
u(t), and estimate the filter H. We also consider the compressed sensing problem, namely, x(t) 
is compressed by means of a random Gaussian ensemble analogous to compressed sensing and the 
resulting output is measured noisily. Our task is to again detect u(t) and estimate H. Our goal 
is to characterize the minimum number of random samples required for accurate detection and 
estimation. 
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Filtered sparse processes [?] arise in a number of contexts such as nuclear radiation [?], neu- 
ronal spike trains [?], reflection seismology [?], and communications [?]. For instance, in reflection 
seismology [?], the system H is commonly referred to as the wavelet and serves as the input signal. 
This input signal passes through the different layers of earth. Each layer of the earth reflects the 
signal with some attenuation and the superposition of the reflected signals from the different layers 
of the earth is received as a seismic trace by a geophone located on the earth's surface. The exact 
input signal or the wavelet is unknown in some cases (such as when explosives or airguns are used 
as input signals) and not precisely known in other cases. The superposition of reflected signals can 
be expressed as a convolution of the wavelet with a sparse input, associated with the different layers 
of the earth. This results in the blind de-convolution problem for filtered sparse processes since 
the wavelet, the reflected coefficients and the associated delays are all unknown. Similar problem 
arises in many other applications such as nuclear radiation [?]. 

While blind de-convolution has been well studied in the context of image de-blurring (see [?]), 
blind de-convolution for filtered sparse processes has not been subjected to significant attention. 
On the other hand several researchers have studied sparse deconvolution [?, ?, ?, ?] of filtered sparse 
processes, namely, for the case when the filter H is known. The main approach proposed heretofore 
involves iterative schemes such as the block decomposition scheme proposed by [?]. Herein the filter 
H is first approximately estimated. The sparse input z is then estimated using a de-convolution 
algorithm. The filter H is then updated and this procedure is recursed. There are several drawbacks 
in this approach. First, there are no guarantees for convergence. Second, the de-convolution step 
is usually computationally expensive. It is either formulated as a combinatorial problem [?] and 
several methods including MCMC methods [?] have been recently proposed. 

We propose l\ minimization techniques for joint estimation and detection of H and u respec- 
tively. We show under mild technical assumptions we demonstrate that both H and u can be 
exactly recovered both in the noiseless as well as noisy scenarios. In addition we consider the 
associated compressed sensing problem as illustrated in Figure ??. To the best of our knowledge 
compressive sensing for filtered sparse processes and associated blind de-convolution is entirely new 
and has not appeared before. Compressed sensing in this context can be interpreted from two 
different viewpoints: (a) As a sampling scheme whereby the output x(t) is passed through a filter, 
whose coefficients are drawn from a random IID ensemble; (b) As a random excitation scheme, 
whereby the composite linear system x(t) is excited by a random input. Both these perspectives 
turn out to be equivalent since convolution is commutative. We then derive recovery results in the 
compressed context. Our results show that if u(t) is a K-sparse signal and H is an AR process of 
known order, the support of z can be recovered from 0(fclog 3 (ra)) measurements. 
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Figure 1: Block diagram illustrating two equivalent perspectives of compressed blind de-convolution: (a) 
As a sampling scheme realized as the output of a random filter, (b) As a random input excitation. 

At first glance the problem as posed appears difficult. For one there is no reason GH satisfies 
isometry property when H is not orthonormal. In addition Hu is neither sparse nor approximately 
sparse as illustrated in Figure ??. Here the signal x has been generated by passing a 10-sparse 
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Abstract 

Suppose the signal x € R" is realized by driving a k-sparsc signal u € R" through an 
arbitrary unknown stable discrete-linear time invariant system H, namely, x(t) = (h * u)(t), 
where h(-) is the impulse response of the operator H. Is x(-) compressible in the conventional 
sense of compressed sensing? Namely, can x(t) be reconstructed from small set of measurements 
obtained through suitable random projections? For the case when the unknown system H is 
auto-regressive (i.e. all pole) of a known order it turns out that x can indeed be reconstructed 
from 0(fclog(n)) measurements. We develop a novel LP optimization algorithm and show that 
both the unknown filter H and the sparse input z can be reliably estimated. 

1 Introduction 

In this paper we focus on blind de-convolution problems for filtered sparse processes. These types of 
processes naturally arise in reflection seismology pQ . The LTI system H is commonly referred to as 
the wavelet, which can be unknown, and serves as the input signal. This input signal passes through 
the different layers of earth and the reflected signal z corresponds to the reflection coefficients from 
the different layers. The signal z is typically sparse. The reflected output, which is referred to as 
the seismic trace, is recorded by a geophone. Other applications of filtered sparse processes include 
nuclear radiation [2j, neuronal spike trains [3] and communications [3]. 

Specifically, a sparse input u(t) is filtered by an unknown infinite impulse response (IIR) discrete 
time stable linear filter H and the resulting output 

x(t) = (Hu){t) = J2u(n)h(t - n) 

i 

is measured in Gaussian noise, namely, y(t) = x(t) + n(t) for t = 0, 1, . . . , N. The goal is to detect 
z(t), and estimate the filter H. The main approach heretofore proposed for blind de-convolution 
involves heuristic iterative block decomposition schemes (first proposed in [5]). Here the filter and 
sparse inputs are alternatively estimated by holding one of them constant. While these algorithms 
can work in some cases, no systematic performance guarantees currently exist. We explore a convex 
optimization framework for blind de-convolution. 
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In addition we consider the compressed sensing problem, namely, x(t) is compressed by means 
of a random Gaussian filter ensemble, as described in Figure [Hand the resulting output is measured 
noisily. Analogously, we can consider a random excitation model as in Figure (2) Our task is to 
detect z(t) and estimate H. Our goal is to characterize the minimum number of random samples 
required for accurate detection and estimation. 
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Figure 1: Compressed blind de-convolution. 
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Figure 2: Estimation of filtered sparse process: Random Excitation. 



1.1 Comparison with Compressed Sensing 

Note that this is significantly different from the standard Compressed sensing(CS) [61 [7] problem. 
In standard CS we have a signal or image, x G M n , which is sparse in some transform domain. 
Specifically, there is a known orthonormal matrix H such that the transformed signal z = H T x is 
k-sparse, namely, has fewer than k non-zero component^]. A matrix G £ ]R mxn then maps x to 
measurements y = Gx = GHu. For suitable choices of matrices G, such as those satisfying the so 
called Restricted Isometry Property (RIP), the k-sparse signal z can be recovered with 0(k\og(n)) 
measurements as a solution to a convex optimization problem: 

min||n||i subject to y = GHu 

This result holds for all sparsity levels k < an, a < I, for sufficiently small a. There has been 
significant effort in CS in recent years leading to various generalizations of this fundamental re- 
sult. This includes the case when the signal x is approximately sparse (see [8j [9]) and when the 
measurements are noisy, i.e., y = GHu + e (see [9]). 

This paper is a significant extension of CS to cases where H is not only not orthonormal but 
also arbitrary and unknown. Specifically, H, is a causal discrete linear time invariant system (LTI) 
with an unknown impulse response function h(-) as described above. A typical signal x is neither 
sparse nor approximately sparse as we will see in Section [3 

1.2 Our Approach 

Our CS problem (schematically shown in Figures LT][2]) boils down to determining whether there is 
a sampling operator G with 0(klog(n)) samples such that the signal x can be recovered uniquely 

1 This is often referred to as transform domain sparsity. 
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from the samples y = Gx = GHu using a convex optimization algorithm. It turns out that this is 
indeed the case when H is belongs to the set of stable finite dimensional AR processes of a known 
order. 

At first glance the problem as posed appears difficult. For one there is no reason GH satisfies 
isometry property when H is not orthonormal. To build intuition we describe a practically relevant 
problem. A specific example is when a; is a one-dimensional piecewise constant signal. Such a signal 
is not sparse but does have a sparse derivative, namely, u(t) = x(t) — x(t — 1) is sparse. Clearly, 
the signal x can represented as an output of an (integral) operator H acting on a sparse input u, 
namely, x = Hu. However, H is no longer orthonormal. To account for this scenario one usually 
minimizes the total variation (TV) of the signal. A compressed sensing perspective for this case 
has already been developed [ID] . 

We develop an alternative approach here. Suppose we now filter x through an LTI system G 
whose impulse response is git). Mathematically, we have, 

y (t) = [g*x)(t) = (g*h* u)(t) = ((g * h) * u)(t) = (h * g * u)(t) 

Since, the composite system g * h is LTI we have that, 

z(t) = y{t) -y(t-l)=g* (x(t) - x{t - 1)) = (g * u)(t) 

Now we are in the familiar situation of z = Gu of the standard CS problem, except that G is a 
Toeplitz matrix. Consequently, if the Toeplitz matrix G satisfies the RIP property we can recover 
z using standard tools in CS. Indeed RIP properties of Toeplitz matrices have been studied [11] . 
Note that this idea generalizes to arbitrary but known finite dimensional stable LTI systems, H . 
The main idea being used here is the commutative property of convolutions. 

However, the question arises as to how to deal with unknown system HI It turns out that 
corresponding to every finite dimensional LTI system there is an annihilating filter [12] , If H is a 
pth order linear dynamical system it turns out that the annihilating filter, H 1 - is parameterized by p 
parameters. Now employing commutativity of convolution, namely, g*h = h*g, followed by filtering 
through the annihilator we are left with a linear characterization of the measurement equations. 
We are now in a position to pose a related l\ optimization problem where the parameters are the 
sparse signal z as well as the parameters governing the annihilating filter. Our proof techniques 
are based on duality theory. 

Strictly speaking, for AR models commutativity is not necessary. Indeed, we could consider 
general random projections, but this comes at a cost of increasing the number of measurements 
as we will see later. On the other hand RIP properties for random projections is (provably) 
significantly stronger than Toeplitz matrices. Nevertheless note that in the random excitation 
scenario of Figure O the structure does not lend itself to a random projection interpretation. For 
these reasons we consider both constructions in the paper. 

The paper is organized as follows. The mathematical formulation of the problem is presented 
in Section [2] Section [3] describes the new i\ minimization algorithm. The result for recovery with 
AR filtered processes (Theorem [I]) is stated in this section. The proof of Theorem [I] can be found 
in Section [5] To help the reader understand the main idea of the proof we first consider a very 
simple case and Section [531 provides the proof for the general case. Section [3TT1 addresses the blind- 
deconvolution problem, which can be regarded as a noisy version of our problem. We use LASSO 
to solve this problem and the detailed proof is provided in Section [6j In Section 01 we extend the 
our techniques to two related problems, namely, decoding of ARMA process and decoding of a 
non-causal AR process. Finally, simulation results are shown in Section [7] 
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2 Problem Set-up 



Our objective is to reconstruct an autoregressive (AR) process x(t) from a number of linear and 
non-adaptive measurements. An autoregressive model is known as an "all-pole" model, and has 
the general form 

p 

x(n) + '^2aix(t — i) = z(t) (1) 

i=l 

where z(t) is a sparse driving process. We assume the vector z = [zq, • • • , z n -i] T is /c-sparse, that is, 
there are only k non-zero components in z. The task of compressed sensing is to find the AR model 
coefficients a = [ai, • • • , a p ] T and the driving process z = [zq, • • • , z n -i] T from the measurement y. 
In this paper, we assume that the AR process x(t) is stable, that is, the magnitude of all the poles 
of the system is strictly smaller than 1. In later discussion, we use x% or x(t) interchangeably for 
convenience of exposition. 

Note that in standard CS setup, the signal x is assumed to be sparse in some known transform 
space. However, in our problem, the AR model is assumed to be unknown and the main contribution 
of this paper is to solve this new problem efficiently. 

We consider two types of compressed sensing scenarios: 

2.1 Toeplitz Matrices 

Here we realize m measurements by applying the sensing matrix G to signal x — [^o? * * * ? %n— i] 
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where each entry is independent Gaussian random variable AA(0, 1) or independent Bernoulli ±1 
entries. Here the Toeplitz matrix G preserves the shift-structure of the signal. Roughly speaking, 
assume z' is a shifted version of z (disregarding the boundary effect), then Gz' is also just a shifted 
version of Gz. This is particularly suitable for the random excitation model of Figure [2 

For notational purposes we denote by a;W (or GW) to denote the subvector of x (or submatrix 
of G) that is composed of the last s components (or s rows) of x (or G). By rearranging the above 
Equation [2] and using the shift-property of G, we have the following equation. 
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where we recall that z = [zq, • • • , z n -i] T . Now Equation [3] is simplified to 

Ya + y [m - p] = G [m ~ p] z (4) 
where a = [a±, ■ ■ ■ , a p ] T G MP and z 6 M. n (fc-sparse) need to be decoded from the model. 
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2.2 Random Projections 

Here we consider randomly projecting the raw measurements x(t), namely, 

n-l 

y(t) = ^2 9t,rx(r), t = 0, 1, . . . , m 

where, each entry g^ T is an independent Gaussian random variable A/"(0, 1) or independent Bernoulli 
±1 entry. The reason for choosing random projections over random filters is that IID random 
Gaussian/Bernoulli matrix ensembles have superior RIP constants. The optimal RIP constants for 
toeplitz constructions has not been fully answered. Nevertheless, note that to form the matrix Y 
with random projections requires significantly more projections. This is because we can no longer 
exploit the shift-invariant property of convolutions. For instance, consider again the matrix Y of 
Equation [3J above: if random projections were employed instead of Toeplitz construction the entry 
?/l on row 1 will not be equal to the entry i/i in the second row. This means that for a pth. order 
model we will require m x p measurements. 

Notation: To avoid any confusion, we use u* to denote the true spike train and u refers to any 
possible solution in the decoding algorithm. Similarly, a* represents the true coefficients. 



3 ^-minimization Algorithm for AR Models 

Since the AR model is unknown, standard decoding algorithms (e.g., Basis Pursuit [8], OMP [15] . 
Lasso [14] . etc.) can not be directly applied to this problem. However, we can regard the signal 
(it, a) (the original signal it together with the unknown coefficients a) as the new input to the model 
and (it, a) is still sparse if p (the length of a) is small. 

With this in mind we solve the following l\ minimization algorithm 

min ||d|i subject to Y a + y^ m ~^ = G^ m ~^u (5) 

More generally, when the measurement y is contaminated by noise, that is, the sensing model 
becomes y = Gx + w where w is Gaussian noise, the above LP algorithm will be replaced by Lasso, 

min -\\Ya + y^ m ^ - G^ m ^u\\l + Allulli (6) 

where A is a tuning parameter that adapts to the noise level. 

Alternatively, the coefficient a can be solved from Equation U] by taking pseudo-inverse of Y, 

a = (Y T Y)~ l Y T (G [m ~ p] u - y^ m 'P^ (7) 

Then Equation U] becomes 

(/ - Y(Y T Y)~ 1 Y T )y[ m ~ p } = (J — Y{Y T Y)- l Y T )G [m - p] u 
and similar to Equation we can apply the following i\ minimization to find the solution for it. 

min lliilli subject to Py^ 1 ^ = PG [m - p] u (8) 

u6R n 
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where P denotes the projection matrix / — Y(Y T Y)~ 1 Y T and ||«||i denotes the l\ norm of u. Sup- 
pose the solution of Equation[8]is u. Then a can be easily derived by a = (Y T Y)~ l Y T (G' m ~ p 'n — y[ m ~ p l) 
and the signal x(n) can be recovered through Equation [TJ 

We note that Equation [8] is equivalent to Equation [5] if Y T Y is invertible, which is always 
assumed to be true in this paper. To summarize the above discussion, our algorithm is summarized 
below. 

(1) Inputs: Measurement y, sensing matrix G and order of the system p. 

(2) Compute u and a: Solve the i\ minimization (Equation [5] or [8]) or Lasso (Equation EJ) . 

(3) Reconstruction: Recover the signal x(n) through forward propagation of the AR model of 
Equation [TJ 

Before stating the main result, we recall that for every integer S the restricted isometry constant 
13 EH]> Ss is defined to be the smallest quantity such that G^ 1 p ' obeys 

(1 - 8 s )\\x\\l < \\G [ ™- p] x\\ 2 2 < (1 + 5 s )\\x\\l (9) 

for all subsets T C {0, 1, • • • , n — 1} of cardinality at most S and all {xj)jgr- 

Note that when the AR filter a(n) is known the result is a direct application of standard 
compressed sensing results. We state this without proof below for the sake of completion. In other 
words, if the coefficients a(n) are known, u*(-) is the true driving process in Equation E] then u*(-) 
is the unique minimizer of 

min lldli subject to Ya + y^-'^ = G [m - p] u (10) 

A main result of our paper is the following where a(n) is assumed to be unknown. We need the 
following assumptions before we state our the theorem. 

1. Constant Order: We assume that p, the order of AR process x(n), is a constant (i.e., p 
does not scale with n,m,S). 

2. Exponential Decay: Suppose the impulse response \h(i)\ of the AR model satisfies 

\h(i)\ < Mp { 

for some constant M and < p < 1. 

3. Distance between Spikes: We define the constant I := (log(j^) + plog( 6 ^ axM )^ /log(p _1 )+ 

p and impose the condition that any two spikes, u*, u* satisfy \i — j\ > I, i 7^ j. This implies 
that the sparsity k := |Supp(n*)| < mm{S/l, S/3}. 

4. Spike Amplitude: We also assume that any spike is bounded, /3 m in < \uk\ < /3 m ax, VA; G 
Supp(u*). 

Theorem 1. Suppose assumptions 1-4 above are satisfied. Let the integer S satisfy y~§s~ < 1- V 
u*(-) is the true driving process in Equation[I\ then it is the unique minimizer of 

min Hi subject to Ya + y^ m ~^ = G^ m ~^u (11) 

u£R n ,a£RP 
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Intuitively speaking, the condition in the theorem requires that the driving process u(n) is 
sparse enough and any two spikes (ui,v,j) are reasonably far away from each other. This type 
of assumption is actually also necessary. In section 15. 2| we give an example where two spikes are 
consecutive and show that in this case x(n) can not be solved via equation [U The proof of Theorem 
CD is presented in Section [5l 

Remark 3.1. The reader might be curious as to whether a random convolution train provides 
benefits over random projection. Note that by using random convolutions we can naturally exploit 
shift-invariance property. Since Y G R m_ P x P as in Equation [3] is a partial Toeplitz matrix, we only 
need m output measurements. In contrast for a random projection, since we can no longer exploit 
this property, we would require 0(mp) measurements. 

3.1 Noisy Blind-deconvolution 

We consider the noisy blind-deconvolution problem with IID Gaussian noise, w% ~ jV(0, a 2 ), and 
measurements 



(12) 



In this section we consider the 



y(n) = x(n) + w(n) 

where the process x(n) is modeled by x{n) + 'Y^i=\ (Hx(n — i) = u(n) 
problem of reconstructing the sparse spike train u(n) and coefficients a from the observed signals 
y(n). This problem is called "Blind deconvolution" [2j [16] and it is a simplified version of the 
Compressed Sensing problem where the sensing matrix G is identity matrix. To the best of our 
knowledge, even this simplified problem is still not completely solved in literature. Therefore, we 
focus on the uncompressed noisy version here. The noisy compressed version is technically more 
involved and will be reported elsewhere. 

Replacing x(n) with y(n) — w(n) in the AR model, we get 



l( n ) + X/ a *2/( n ~~ *) = n ( n ) + e ( n ) 



(13) 



i=i 



where we denote e(n) := w(n) + Yli=i a i w i n ~ *)■ 
Again by introducing 
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y + Ya = u + e 
Here Lasso is used to solve the problem: 

min - \\y + Ya — u\\o + A||it||i 
«eM»,oeRp 2 
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We can show that the solution of Lasso is very close to the true a* and u* . Before stating the 
theorem, we first introduce some notation and technical conditions that will be used in the proof. 
We denote the noiseless version of Y as 

••• 

Xq ■■■ 

Xp-i ■■■ x 

Denote the support of u* as /. We define X\ as the matrix comprising of the rows of X indexed 
by / and X^ as the matrix comprising of the rows of X indexed by I c . We also denote x max = 
max; \xi\, u min = mim 6 / \m\ and a max = max; |dj|. 

We assume that the AR process x(n) satisfies the following set of conditions. 



(1) The smallest eigenvalue \ m i n (X 2 T X2) > > 

(2) HXfsgn^DIU < HarllzVKgH, 

2 

(3) x max > 2frVlog n and ^ff < min{ : 



Anpa 2 
(^2-l) 2 



for some constant c > 1. 



Acy/2pn ' y 24cpy'log n 



}• 



In practice, condition (1) is generally satisfied. For instance, if the signal x is persistent, 
TT^jT^Xj X2 converges to a constant invertible matrix. Condition (3) is also standard in compressed 
sensing, which says we need SNR > O(logra). In addition, we also need the assumption that no 
components are dominantly large (compared with the total energy of x). The upper bound for 
^max/H^lb can be relaxed but the current setup simplifies the analysis. 

Condition (2) is new. Let us consider two scenarios. In the first scenario, each spike in ui 
can be either positive or negative with equal probability (i.e. sgn(u/) is Bernoulli ±1). In this 
case, X^sgn(uj) behaves like a sub-Gaussian sum and it is usually upper bounded by ||x||2\/Iogn 
with high probability. On the other hand, let us also consider the case when all the spikes in ui 
are of the same sign, say positive. In this case each entry in Xf and sgn(n}) is positive and the 
inner product of these two aligned signals is typically much larger than the first scenario. This 
phenomena is also illustrated in the experiments shown in Figure El In the experiment, the AR 
model is xt — lAxt—i + 0.45xt_2 = u(t). The blue curve corresponds to the scenario when sgn(iij) 
(ui is a spike) is Bernoulli dbl. The red curve corresponds to the case when the sign of any spike 
Ui is always +1. Each point on the curve is an average over 40 trials. We can see that in the 
first scenario (blue curve) we can tolerate many more spikes. To the best of our knowledge, this 
behavior does not exist in standard compressed sensing problem. 

Theorem 2. Denote P := I — Y(Y T Y)~ l Y T and assume condition (1),(2) and (3) stated above 
are satisfied. We also assume parameter A is chosen such that A > 6apa mauX ^\og n and u m i n > 2A, 
the solution to Lasso [75| is given by 



uj = {PfPjy^Pje 
uic = 
o = -(Y T Y) 



Xsgn(u*j)) + u*j 



l Y T {y 



u 
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Bernoulli-Gaussian Process, 2-order AR model, SNR = 28dB 
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Figure 3: Comparison of two sign conditions for u. The AR model is xt — l.Axt-x + 0.45:Ef_2 = u(t). Noises 
ere added to the measurements and SNR = 28 dB. In one experiment, each sign of each spike is either 
positive or negative with equal probability. In the other experiment, the sign of the spikes is always positive. 

and we have sgn(u*) = sgn(u) with probability at least 1 — 8p/n — (p + l)2 _Tl//5 . 

Remark: The assumption u m i n > 2A implicitly implies an SNR bound O(logn) for the smallest 
spike. The assumption A > 6<rpa max ^/log n ensures A to be sufficiently large so that every non-spike 
element is shrunk to zero by the Lasso estimator. It is hard to analyze the case when parameter 
A is smaller because in this case it is not clear how to construct ujc which is critical for tractable 
KKT analysis. The choice of u in the Theorem [2] is motivated by the proof techniques used in |17j . 
The proof of Theorem [5] is presented in Section 

4 Extensions 

In this section, we provide two interesting extensions to the AR model problem. First, we generalize 
AR model to the autoregressive moving average (ARMA) model, i.e., the process contains both 
poles and zeros in the transform function. Second, we develop an algorithm for the non-causal AR 
process, i.e., the current state not only depends on the past inputs but also depends on the future 
inputs. 

4.1 ARMA model 

The ARMA model takes the form 

P Q 

x(n) + aix(n — i) = u(n) + biu(n — i) (19) 

1=1 8=1 

Again we use Equation[2]to obtain the measurement y = Gx where G is a Toeplitz matrix as defined 
in Section [2l Similar to what we have done in Section [21 we write down the matrix representation 
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of the ARMA model: 



x 
x\ x 

Xp Xp—X 



1 

ax 



J^n—l %n—2 ' ' ' %n—p_ 

We denote the lower triangular matrix B as 



10 

6i 1 

b q ■■■ bx 

• • • b„ 



■■■bx 1 



Ux 

U n -2 
U n -X 



(20) 



D 



1 
&i 1 

b„ ■■■ bx 



) ■■■ b q ■■■b 1 1 
By multiplying G^ m ~ p ' to both sides of Equation 1201 we get 

Ya + y [m ~ p] = G [m ~ p] Bu 



(21) 



(22) 



Note that for ARMA model we have an additional term B compared to Equation HJ Generally, 
matrix B is unknown. We first consider a simple situation when B is assumed to be known to the 
decoder. Based on Theorem Q] we can derive the following result. . 

Theorem 3 (Known Zero Locations). Given the same technical conditions as Theorem [7] and 
assume u* is the original sparse spike train that generates the ARMA process. Then u* is the 
unique minimizer of 



mm 

ueM. n ,w 



\u\\x su 



bject to Ya + y [m - p] = G [m ~ p] B 



u 



(23) 



Proof. Note that B is also a Toeplitz matrix. From the commutativity of Toeplitz matrix, we have 
G [m-p] B = BQ[mr-p] m Section the KKT conditions claim that u* is the unique minimizer 

of Equation [23] if and only if there exists a vector tt such that: 

1. (n T G^ m -^B). = sgn«) for all i G Supp(u*), 

2. | (x T G^ m -P^B) . | < 1 for all j £ Supp(-u*), 

3. tt t Y = 0. 

Applying the commutativity and define tt t = ir T B, the above three conditions are converted to 

1. (tFG^-p]) . = sgn«) for all i G Supp(u*), 

2. | (tFG^-p^) . | < 1 for all j g" Supp(n*), 
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3. % T B~ X Y = 0. 



Note that both the inverse B~ 1 and the matrix Y are Toeplitz. Therefore, from commutativity, 
the third equation is equivalent to ^YB" 1 = 0. Finally, since B~ l is invertible, the last equation 
can be further simplified to tt t Y = 0. Now the KKT conditions look exactly the same as those in 
Section [5j Hence the corollary is proved by following the same argument as in Section □ 

Now we consider the general situation when B is unknown. The difficulty of decoding lies in 
the fact that we know neither B nor the spike train u(n). There might exist different combinations 
of hi and u(n) that matches the measurements y(n). 

Here we propose an iterative algorithm for estimating (u, a, b) in Equation [22j Each iteration 
comprises of two basic steps. First, if B is known (from previous iteration), we can use the following 
l\ minimization algorithm to solve u and a (Theorem [3]) . 



min Mi s.t. 



\Ya + y [m ~ p] - G [m - p] Bu\\ 2 < e 



(24) 



Here e > is required, even though there may not be any noise, to ensure that we do not get stuck 
in a local minima. 

Now once u is determined we switch from u to B, as the optimization variable. This problem 
reduces to a standard regression problem. First we rewrite Equation [22] as follows: 



Ya + y [m ~ p] = G [m - p] 



u 










"1" 


Ui 


Uq 















h 












U q 


U q -i ■ ■ 


u 














A. 


Un-l 


U n -2 ■ ■ 


U-n— q— 1_ 





which can be simplified to Ya + y^ m p l = G^ m p ^u + G^ m p ^Ub where we denote 



U 





»o 

Uq-l 



UQ 



_Un-2 • • • U n -q-l_ 

Now we formulate the following least squares optimization problem: 
min \\Ya + y [m - p] - G [m ~ p] u - G [m - p] Ub\\ 2 



(25) 



In summary our iterative algorithm consists of the following steps: 
Initialization: Set &(°) = 0, i.e., B^ = I. 
Iteration k: Compute , , b^ 
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0.6 - 
0.4 - 
0.2 - 



step 30 



step 20 



step 60 



0.1 0.2 0.3 0.4 0.6 0.6 0.7 

b. 




Figure 4: The iterative algorithm on the model x(n) — 1.9x(n — 1) + 1.06a;(n — 2) — 0.144a;(n — 3) = 
u(n) + 0.7u(n — 1) + u(n — 2) where the correct b = [0.7 1]. Left: In trajectory of b in each round of 
iteration; Right: Zoom-in of the final stages of the iterations. Blue * corresponds to the rounds of updates 
with e = 3 while red * corresponds to the rounds of updates with a smaller e = 0.3 in the final stage. 

1. Update and via solving Equation [211 with B = 

2. Update b^ k > via solving least-square (Equation I25|) with (u,a) = (u^ k \a^). 

There is a subtlety in the choice of parameter e in Equation [25J If e is large, the iterative 
algorithm appears to have a faster convergence rate but at the cost of significant bias. On the 
other hand, if e is small, the convergence rate is slow but the solution has small bias. Therefore, 
in practical implementation we choose e to be reasonably large in the early stages of the iteration 
and then decrease it to e/10 at the later stages of the iteration. 

Figure |4] illustrates a concrete example of solving the ARM A model x(n) — 1.9x(n— 1 ) + 1 . 06x (n — 
2) — 0.144x(n — 3) = u(n) + 0.7u(n — 1) + u(n — 2) by using our iterative algorithm. We choose 
e = 3 in the first 50 rounds of iteration and finally in the last 10 rounds of updates we set e = 0.3. 
Figure \Mi>) is a zoom-in version of Figure UK a) which shows the final stage of the algorithm. We 
can see the effects of choosing different value of e as well. 



4.2 Non-causal AR model 

Many real world signals are non-causal. For example, a 2D image is usually modeled by a Markov 
random field, where each pixel is dependent on all its neighboring pixels. In this subsection we 
consider this situation by modeling the signal to be a non-causal AR process. 
A non-causal AR model is defined as 

p p 

x{n) + aix(n — i) + a-ix(n + i) = u{n) (26) 

i=l i=l 

A typical non-causal AR process is shown in Figure Here the impulse response of each spike 
is two-sided as opposed to the one-sided impulse response of causal AR process. In this subsection, 
we discriminate between two boundary conditions for the non-causal AR process. As we will show 
later, there are subtle differences in dealing with these two boundary conditions. 
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A non-causal AR process 




200 



Figure 5: A typical non-causal Autorcgrcssive process: x(n) — 0.375a;(n — 1) — 0.5x(n + 1) = u(n) 



1. Boundary is circulant, i.e., xq = x n ,x\ = x n+ i, • • • ; 

2. Boundary is not circulant. 

4.2.1 Circulant Boundary 

In this case we use the following circulant matrix in the sensing model y = Gx. 



G 



9n-m 9n-m-l " ' 90 9n-l 9n-2 " " " 9n-m+l 
9n-m+l 9n-m ' ' ' 9l 90 9n-l ' ' ' 9n-m 



9n-l 9n-2 



9m-l 9m-2 9m-3 



9o 



(27) 



where gi is i.i.d Gaussian random variable A/"(0, 1) or Bernoulli ±1 random variable. 

Since the boundary of x is circulant = x n -i), we can write the matrix representation of 
Equation [26] as 



x 




Xn—p 


Xl 


Xp 




' 1 " 

a t 




u 


Xl 


XQ 


Xn—p+1 


X2 


■ x p+ i 






Ul 


Xp 


Xp—l 


X 


Xp+l ■ 


X2p 




a p 
a_i 




Up 




X n -2 ■ ■ 


Xn—p 


XQ 


Xp—1_ 




a- p 




Un-l_ 



(28) 



With an abuse of notation, we use to denote the submatrix of G comprising rows i-th. 

through j-th of G. Now we multiply Q[p+ 1:m ~p] to both sides of Equation [28] we get the following 
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equation. 



Vv Vp-i 
Vp+i V v 

V2p V2p-1 
Vn—p Un—p—l 

We define matrix Y to be 



Y 



yo 
yi 



Vp+i 

y P +2 



Vp V2p+1 



Vn—2p Vn—p+1 



V2p 
V2p+1 

V3p 

Vn 



1 

Ol 



a_i 



pj 



Qlp+1-.m-p]^ 



y P -i 
y P 

V2p-1 



yo 
yi 



y P +i 
y P +2 



y2 P 

V2p+1 



y P V2 P +i ■■■ y3 P 



JJn—p—l ' ' ' y-n—2p 2/n— p+1 

and finally Equation [29] is simplified to 

Jp+lm-p] _|_ Y a = Q[p+l-m-p] u 



l/n 



where a = [a±, ■ ■ ■ ,a p , a_i, • • • , a_ p ] G 

As in Section [3]we can use either .^-minimization or Lasso to solve this problem. 



"\ -minimization: min ||m 



|i s.t. y [P+ 1:m ~P] + Ya = Gfe>+ I:m -P], 



(29) 



(30) 



Lasso: min + Ya - G [p+V " ] uf 2 + Alltilli 

neK n ,aeIR 2 ^ , 2 

4.2.2 Non-circulant Boundary 

The case of non-circulant boundary is slightly more complicated. There are two ways of handling 
this situation. A simple approach is to view the problem as a perturbation of the circulant boundary 
case, namely, 

y ]p+l:m-p] + fa + e = G lp+1:m ~ p] u 

where 



Q\p+l:m— p] 



%—l %n— 1 









X. 



X 



n—p 



p+l •^n—p+l 
















-2 - Xp-2 
%n+p— 1 %p— 1 



^n+p- 



Now one could use Lasso to solve this noisy model: 



mm 

«eM n ,a6R 2 P 2 



ly\p+i:m-p] +Ya - G^ +v - m ^u\\l + A||u||i 
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Unfortunately, this approach will have a bias. To overcome this limitation, we consider the case 
where we can make an additional 2p set of measurements corresponding to the boundary conditions, 
namely, 



Um+l — X— p, 

Then by the denoting 



j n—pi j 



Vm+2p — ZCn—l- 



Y ~ Y + Q[p+ 1 - m -p] 












^n—p 



•E—p+1 -En—p+1 











•En -^0 



%n+p— 2 %p— 2 
Xn+p—1 %p— 1 



the sensing model can be simplified to the noiseless version 

y\p+l:m,-p] _|_ Y a = Q\P+~L'-m-p] u 

Again we can use either ^-minimization or Lasso to solve this model: 

^-minimization: min lluL s.t. y^+ v - m -^ + Ya = G [p+1:m " p] n 

Lasso: min I|| y b+ 1;m -p] + Ya - G^ +v - m -^u\\l + A|lul|i 

neK n ,aeIR 2 P 2 

5 Proof of Theorem [I] 

We first write down the primal and dual formulation of algorithm [5j 

min llulli subject to Ya + y lm ~ p] = G [m ~ p] u 

whose dual formualtion is: 

max vr T ?/ m ~ pl subject to \W T G [m " p] IU < 1, ir T Y = 



(31) 



(32) 



The proof is based on duality, u* is the unique minimizer of the primal problem [31] if we can 
find a dual vector tt with the following properties: 

1. (iFG^-A). = sgn(u*) for all i G Supp(u*), 

2. | (7r T G^ m -^) j | < 1 for all j Supp(u*), 

3. tt t Y = 0. 

where sgn(u*) denotes the sign of u* (sgn(u*) = for u* = 0) and Supp(u*) denotes the support of 
vector u* . The above set of conditions ensure that the primal-dual pair (u* , tt) is not only feasible 
but also satisfy the complementary slackness condition, thus optimal. We call the above three 
conditions as the Dual Optimal Condition (DOC). 

The rest of this section is to construct a tt that satisfies the DOC. Our construction relies on 
the following result (see [15]). 
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Lemma 4 (|15j). Let S > 1 be such that 62s <• \, an d c be a real vector supported on T obeying 
\T\ < S. Then there exists a vector it G R m such that (ir T G^ m ~ p ^) . = CjVi € T. Furthermore, ir 
obeys 



< 



S S 



(1 - 35 2S )VS 



\ch Vj^T 



This lemma gives us the freedom to choose (arbitrarily) the value of ir T G^ m p l in the location 
of T while the magnitude of the rest components is still bounded. 



5.1 One Pole Case 

In this section we provide a proof for the simple case when x{n) is a first order AR process (i.e., 
p = 1) and u* only contains one spike (i.e., every entry of u* is zero except one place). Though 
simple, it contains the main idea of proof techniques for the more general case. Note that in this 
simple case the assumptions in Theorem Q] are automatically satisfied. 

For the 1-sparse driving process u* , without loss of any generality we assume uj$ = 1 and 
u* = (Vi > 1). We also denote a = — a as the root of the characteristic function of the first order 
AR process. Due to stability we have |a| < 1. Now in condition 3 of DOC, the term tt t Y can be 
recast as 



tt t Y 



7T 



T Q[m—p] 











x 


= TT T G [m - p] 


1 


Xn-2_ 







In Lemma |U we choose c as cq = 1, c\ = 1/2 and c,- = (J = 2, ■ ■ ■ , S — 1). Then Lemma d] 
tells us that there exists a tt\ such that (7rf G^- m ~ p ^) i = q (Vi = 0, • • • ,S — 1) and furthermore 



< 



S s 



This implies 



vrf Y = vrf G^ 




1 



a" 



1 



n-l 
3=S 



3-1 



(33) 



where the summation 



< 



summarize the above discussion, we find 7Ti such that: 

1. (ttJG^) q = 1 

2. I (7r'[G^ m ^) j I < 1 for all j > 1, 

3. sgn(vrf Y) = 1. 



2|a|* 



VS(l-\a\) 



<C ^. Therefore sgn(-7rfy) = 1. To 
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Similarly, by choosing cq = 1, c\ = — 1/2 and Cj = (j = 2, • • • , S — 1) in Lemma 01 there 
exists a 7T2 such that condition 1 and 2 of DOC are also satisfied while sgn(7r^y) = — 1. Hence, by 
convexity there exists a A G (0, 1) such that for it = Xtti + (1 — A)-7T2, it satisfies ir T Y = and also 
condition 1 and 2, i.e., the whole DOC. 

Finally we find a primal-dual pair (u* , tt) that satisfy all the feasible constraints and also the 
complementary slackness condition, which implies u* is the unique minimizer of the primal problem 
equation [3TJ 



5.2 General Case 

In this section we prove that in the general case the three conditions in Theorem [T] ensures the 
existence of a tt that satisfies the DOC. Before giving the proof, we point out that if some conditions 
in Theorem [T] are violated, there might not exist such a tt. Let us consider the case of p = 1 (first 
order AR process) and k=2 (only two entries of u{n) are nonzero). Moreover, we choose Mq = u\ = 1 
and u* = (Vi > 1), that is, the two spikes are next to each other. 



In this case [xq, x\, ■ ■ ■ , x n _i] = [1,1 + a, a(l + a), 



a 



n-2 



(1 + a)] T . We pick a = -1/2. 



Clearly the assumption \i — i'\ > i, Vi, «' £ Supp(u*) in Theorem Q] is broken. On the other hand, 
we can also check that there does not exist a it that satisfies the whole DOC condition. In fact, 
suppose 7r is chosen such that condition 1 and 2 are satisfied, then in checking condition 3 we find 



vr T Y = 7r T G^ 




1 

1 + a 



a 



n— 3/ 



1 + a) 



n-l 



n-1 



'[m— p] 



J=2 



j ~ 2 (l + a) > 1 - |«r 2 (l - |a|) > 

J=2 



which violates condition 3 in DOC. Hence there does not exist a ir that satisfies all the three 
conditions in DOC. 

Before proving Theorem [lj we need the following lemma in constructing tt. 

Lemma 5. Suppose the assumptions in Theorem [7] are satisfied. Denote T = {j + i : j G 
Supp(u*),0 < i < I}. Then \T\ < S and we also have the following inequalities: 



(1) Vj T and i = 0, 1, • • • ,p, \xj-i\ < 



f-p 



l-p 1 



(2) Vk G U p i=1 {j -i:je Supp(u*)}, \x k \ < ^f^4 



Jin (l-P') _ J 



(3) Vi = 0, 1, • • • ,p — 1 and Vj G Supp(u*), \xj/xj+i\ > r where r := '~'T' V p - — p" ■ 

Proof. First, from the assumption of Theorem [H |T| < 5. Then we need to verify the three 
properties. 

Suppose Uk is a new spike and k' be the next spike. Given i < (k' — k), we clearly have 



\Xfc+i\ < /3maxW(l +P l +P M + ■■■)< 



1 - p l 



17 



Hence properties (1) and (2) follow. 

We denote e := ^™** f J ■ Therefore, for any j G Supp(u*), \xj\ > \uj\ — ep l > f3 m i n — ep l . 
Combining with the above argument, we have 

\Xj/x j+i \ > (Anin - ep )/e = — — p 

Pmax-'" 

Note that when p l < 3 ^ m '" M as given in the theorem assumption, we have r > 3 / min ^ ■ D 

Remark: Property (1) in Lemma [S] says that many components of x(n) are small. Property (2) 
ensures that before a new 'spike' uj begins (j € Supp(u)), the amplitude of Xj— P ,--- ,xj—\ is al- 
ready negligible (i.e., very close to zero) such that the new impulse response caused by Uj can be 
regarded as starting almost from zero level. Finally, property (3) says that when a new spike Uj 
arrives, the corresponding output Xj is reasonably large compared to its neighbors. 

Now we are ready to prove Theorem [U Similar to the last section's argument, the objective is 
to find a sequence of vectors ttx, ■ ■ ■ , tt2p such that any of ir s (s = 1, • • • , 2 P ) satisfies the condition 
1 and 2 of DOC while 

sgn(vrf Y) = [1, 1, • • ■ , 1] T 
sgn(Tr^y) = [-1,1,-.. ,1] T 

sgn^Y) = [-1,-1,... , -if 

and this implies there exists a convex combination ir = X^s=i ^s^s which satisfies ir T Y = and 
also the condition 1 and 2 of DOC. 

Based on Lemma[U we construct tt\ via fixing the values of {(vr^ ' G^ m ~ p ^)i\i^T '■= {ci}i^T- 

{sgn(ii*) if i G Supp(u*) 

(r/2)' l ~ j - 1 sgn(x j ) if i = j + 1, • • • , j + p,Vj G supp(u*) (34) 
if i = j +p + l, ••• + G supp(u*) 

This choice of c gives the bound ||c||2 < y/k + k(l + 2 _1 + 2~ 2 + • • • ) < v3fc. Now by applying 
Lemma HI we know there exists a it\ such that (ttJG^™ 1 "^ . = Cj when i£T and 



irfG [m - p] 



<J*<^/f<i, «*t 



where the last inequality follows from the assumption of the Theorem. Up to now we have shown 
that 7Ti satisfies condition 1 and 2 of DOC. Next we will check the sign of Trf Y. 
Fort = l,2,--- ,p, 

jo+p 

j=jo+l J j£T or jeSupp(u*) J 

io+p / io+t-i 

j 6Supp(n*) j=jo+t \j GSupp(u*) j=jo+l j^T or jeSupp(u*) 
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where the magnitude of At can be lower bounded, 

\A t \ > Y, (r/Zy-^mUl - 2" 1 - 2" 2 2-(P~*)) > fc/W/^ 1 

j eSupp(u*) 

based on property (3) of Lemma [5j And the magnitude of B t is upper bounded, 

|Bt|< £ + ' + ' +-) = fe (l- /3t )(l- p) 

ioeSupp(n*) 

2 \ , _i n ./6fc s M^ /l„„/„-l 



When / > |^log(j :r ^) +plog( p ^" ax )J /log(p ) +p as given by the assumption of the theorem, we 

have \Bt\ < \At\, which implies that the sign of (TrjY)t is determined by the sign of At. 
Hence sgn((7r^y)f) = sgn(^LJ = 1. This implies 

sgn(7rf Y) = [1, 1, • • • , 1] T 

In general, for any sign pattern [s\, • • • , s p ] T (si S {—1, 1}), by choosing {ci}i & T (compare equa- 
tion [33]) in the following way 

{sgn(ii*) if i e Supp(u*) 

Si ■ (r/2) i -^- 1 sgn(x i ) if i = j + 1, ■ ■ ■ ,j+p, Vj E supp(u*) 
if i = j + p + 1, • • • , j + I, \/j e supp(u*) 

and making similar arguments, we have 

sgn(vrfy) = [s 1 ,S2, ■ ■ ■ ,s p ] T 

6 Proof of Theorem [2] 

To prove Theorem [2J we only need to check that (u, a) given in the theorem satisfy the KKT 
conditions. We denote the function f(u,a) = ^\\y + Ya — + A||u||i. Then the gradient of / with 
respect to a is 

^- = Y T {y + Ya-u) 
and the subgradient of / with respect to u is 

df 

tt- = - (y + Ya - u) + Xv 
ou 

where v satisfies vi = sgn(uj) for i £ I and \vi\ < 1 for i G I c . Therefore, we only need to check the 
following set of (in)equalities 

Y T (y + Ya-u) = (35) 
(y + Ya-u)i = Asgn(tii), in / (36) 
\{y + Ya-u)i\ < A, in = (37) 

We first check Equation 1351 
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Lemma 6. Equation{35\is satisfied with (u,a) given in Theorem^ 
Proof. Actually, 



Y 1 (y + Ya-u) = Y (y — u) + Y Ya 

rT/_. ~\ vTvrvTw-lvT 



T {y-u) + Y T ^ 
Y T (y -u)- Y T Y(Y 1 Y)~ L Y 1 (y 

□ 



Next we check Equation 



Lemma 7. Equation [221 is satisfied with (u, a) given in Theorem with probability at least 1 — 
8p/n- (p + l)2~ n / 5 . 

Proof. Note that P has the property that P 2 = P and PY = 0. Therefore by multiplying P to 
both sides of Equation [T4l we have 

Py = Pu* + Pe (38) 

Now we can compute 

y + Ya-u = y-u-Y(Y T Y)- l Y T {y-u) 



(a) 
(6) 



P(y - u) 

Pu* + Pe-Pu 

Pe + P 7 (u}-n 7 ) 

Pe - Pj(Pf P/)" 1 (Pf e - AsgnK)) 



where (a) follows from Equation [38] and (b) follows from the fact that ujc = u* IC = 0. 

There is a small trick here. Since y + Ya — u = P(y — u) as we have shown and P 2 = P, we 
must have P(y + Yd — u) = y + Yd — u. This implies Equation [36] is correct: 

(y + Yd - u)/ = Pf(y + Yd - u) 

= Pf {Pe - P^pTPi)" 1 {Pie - Asgn(^))) 

( = } pjp e - pj e + Asgn(n^) 
= Asgn(n^) = Asgn(u/) 

where (a) follows from PjP = Pj (i.e., P 2 = P) and the last equality holds true with probability 
at least 1 — 8p/n— {p+ l)2~ n / 5 . The proof of last equality is similar to the proof of Lemma [TOl and 
is omitted here. □ 

Verifying inequality 1371 requires more effort. We first simplifies the formula for [y + Yd — u)ic 

Lemma 8. With (u,a) given in Theorem^ we have 

(y + Ya- u)jc = -Y 2 (Y 2 T Y 2 )- 1 Y 1 T As 5 nK) + (I ~ Y 2 {Y?Y 2 y 1 YT)e I c (39) 

where we denote Y\ as the submatrix comprises of the rows ofY indexed by I and Y 2 as the submatrix 
comprises of the rows ofY indexed by 1°. 
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Proof. Following from the proof of Lemma we have , 

(y + Ya- u) T c = Pf c (P - P I {PfP I )- x P[)e + P%Pi{Pf P/)- 1 Asgn«) 



(40) 



To simplify the above equation, we introduce Pn £ M. kxk as the matrix comprises of the rows 
of P indexed by / and the columns of P indexed by I. Similarly, P12 S ^( n ~ k ) xk is the matrix 
comprises of the rows of P indexed by / and the columns of P indexed by I c ; P22 £ R( n_fc ) x ( n_fc ) 
is the matrix comprises of the rows of P indexed by 1° and the columns of P indexed by I c . By 
this definition, after some column and row permutations, P can be rewritten as 
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P 
P T 



Pl2 
P22 



It is easy to check that Pn = Pj Pj and P^ = Pf c Pi (since P 2 = P). Furthermore, 



pT p— 1 pT 
- r 12- r ll 



[PS P22] - PfsPn 1 [P11 

= [0 P22 — P 1 ^P 11 1 Pl2] 

Hence, Equation [40] can be simplified to 

(y + Ya- u) IC = (P 22 - P 1 T 2 P n 1 P 12 )e /c + AP 1 T 2 P n 1 sgn(nJ) 
We note that Pn, P12, P22 can be expressed in terms of Y, Y\ and Y2. 

P n = I — Y\(Y T Y)~ l Yi 

P12 = -liO^Y)- 1 ^ 
P22 = ^ - Y 2 {Y T Y)~ 1 Y 2 

Moreover Pj^ 1 can be derived via matrix inversion lemma: 

p-i = (j-n^y)-^)- 1 



Finally, we get 

AP 1 T 2 Pr 1 1 sgn(^) 



(41) 



/ + Yi(Y T Y - Y?Y X )- X Y? = 1 + Y 1 (Y 2 T Y 2 )- 1 Y^ 
= -Y 2 (Y T Y)- X Y?(I + Y^Y^Y? )Asgn(«J) 

= -y 2 [(y T y)~ 1 + (y T y)- 1 y 1 T yi(y 2 T y2)- 1 ]y 1 T As g n(4) 
= -y 2 (y t yy x ^ 2 t y 2 + y 1 T y 1 ](y 2 T y 2 )- 1 y 1 T AsgnK) 

( = } -y 2 (y 2 T y 2 )- 1 y 1 T As g n(n}) 

where (a) follows from the fact that Y T Y = Y 2 Y 2 + Y^Y\ . And similarly by repeatedly using this 
fact we can find the following simplification 



(P22 - Pr2PfiPi2)e/c 



(l-Y 2 {Y T YY x Y 2 T )e lc 

-(y 2 (y T y)- 1 y 1 T (/ + y 1 (y 2 T y 2 )- 1 y 1 T )y 1 (y T y)- 1 y 2 T ) e/c 
(/-y 2 (y 2 T y 2 )- 1 y 2 T ) e/ c 



□ 
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In order to justify the condition [371 we also need the following lemma. 
Lemma 9. The following three claims hold true: 

(i) w.p. at least 1 — p ■ (4/n + 2~ n / 5 ), ||Y 2 T e/ c ||oo < 2^n\^^aa m ^x max ^j2p. 



(ii) w.p. at least 1 — 4p/n, \\Yj Xsgn(zj)\\ 00 < 2A||x||2\/log n 

(iii) w.p. at least 1 - 2^, A max ((Y/Y.r 1 ) < 2A max ((XfX,)- 1 ) < ^ 

Proof. To prove (i), we try to bound the first component (Y^e/c) By definition, the first column 
of Y equals [0, y 0) • • • , Un-2] T = [0, xq, ■ ■ ■ , x n _ 2 ] T + [0, w , ■ ■ ■ , w; n _ 2 ] T . We also remember = 
Wi + X2?=i a i w i-i where Wi are i.i.d. Gaussian j\f(0,a 2 ). Hence, we have 



(Y 2 T e/ c ) 1 = ^ + ^ a 3 w i-j) + ^2 w i-i(. w i + ^2 a J w i-j) 

i£l c j=l i£l c j=l 

It is easy to check that the first term of RHS is zero-mean Gaussian random variable with 
variance < po max £ max ^<7 2 - It ^ s we ^ known that for standard Gaussian random variable t, Pr(|i| > 
a) < 2e~ a I 2 . So we conclude that with probability > 1 — 2/n 



} J Xi-l(Wi + ^ a 3 W i-i, 
i£l c j=l 



5^ 0~ (ZmavX 



max ^ max 



a/ 2pn log 



n 



It also can be proved that with probability > 1 — 2/n — 2 n / 5 



^ Wi-l{Wi + ^2 a 3 w i-j) 
iel c j=l 



< 2pa max <T 2 i/nlog 



n 



We notice that cra raax _x raax y/2pn log n > 2pa nmx a 2 V n log n and hence claim (i) follows. 
Next, we prove claim (ii). Again, (Y 1 T Asgn(2;|)) 1 can be decomposed into two terms; 

(y 1 T Asgn(z})) 1 = (X 1 T Asgn(^)) 1 + ^ ^_ 1 Asgn(zJ) 

The first term is bounded from the assumption and the second term is Gaussian which is bounded 
by \ay/2n log n < A||a;||2 \/log n (assumption (3) in Subsection I3.1j) w.p. > 1 — 2/n. 

For (iii), we only need to show that with high probability A m j n (Y^Y^) > ^A m i n (X-X?), or 
Cmin (Y 2 ) > -^(Jmin (X2) where <7 mm (A) denotes the smallest singular value of A. 

We denote the Gaussian noise matrix 



W 





W p -l 






w 



W n -2 



w 



n—p_ 
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and call W 2 as the submatrix that comprises of the rows of W indexed by I c . Then, we have 

(T m\ n (Y 2 ) = min ||y 2 t|| 2 = min ||X 2 t + W2*|| 2 
||*||2=1 ll*lla=l 

> min ||X 2 t|| 2 - max \\W 2 t\\ 2 = o- min (X 2 ) - <r max (W 2 ) 

= 1 ||f||2 = l 

So the remaining work is to upper bound o~ ma , x (W 2 ). A tight bound in this case is very difficult. 
However, the following bound is good enough for our proof. By denoting W 2 i as the i-th column 
of W 2 , we have 

G>"max(W 2 ) = max II W2* II 2 
P 2 = 1 



max i jY^{W 2 ,i,t)' 



where the second last inequality follows from Cauchy-Schwartz inequality. Then by the tail prob- 
ability of x 2 distribution, we have with probability 1 — 2~ n / 5 , 



o- max (W2) < y^Mll < V 2npa 2 
Then by applying assumption (1) in Subsection 13.11 we have proved the claim (iii). □ 
Finally, we can show that (u, a) satisfies the condition 1371 

Lemma 10. Equation [35\ is satisfied with (u,a) given in Theorem^ with probability at least 1 — 
8p/n- (p + l)2~ n / 5 . 

Proof. From the tail probability of standard Gaussian Pr(|f| > a) < 2e _a2//2 , we know that with 
probability at least 1 — 2/n, maxj \wi\ < 2cr\/log n. Therefore the l 2 norm of all the rows of Y 2 
is upper bounded ^Jp{x m&SL + 2ayf\ogn) with probability at least 1 — 2/n. Combined with claim 
(iii) in Lemma [91 we know that the l 2 norm of all the rows of Y 2 {Y 2 T Y 2 )~ l is upper bounded 
™(x max + 2aVlog^) < 4c vfi^ max with probability at least 1 - 2/n - 2" n / 5 . 

Il x ll2 Flh 

Now we can verify that both -Y 2 {Y^Y 2 y 1 Y^ \sgn(uj) and (I - Y 2 (Y 2 T Y 2 )~ 1 Y 2 T )e I c are small. 
First, based on claim (ii) in Lemma El with probability at least 1 — 2 "t p — 2~ n ^ 5 

II - >2(y 2 T y 2 )- 1 Y 1 T Asgn(n})|| 00 < 4C ^f^ max • 2A||x|| 2 Vlog^- VP < A /3 

\\ x \\ 2 

where the last inequality follows from condition (3) in Subsection 13.11 

Next, it is easy to bound He/cjloo < 2<7pa max \/logn < A/3 with probability at least 1 — 2/n. 
Also, we have with probability at least 1 — — (p + l)2~ n / 5 



1 1*2 1*2 Y 2 )~ Y 2 e/ c ||oo < rj — 2Vnlogn<Ta ma xXma X V 2 P - Vp < V 3 

II 3 - II 2 

where the last inequality follows from claim (i) of Lemma [9l condition (3) in Subsection 13.11 and 
the assumption A > 6apa max ^log n. □ 
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Figure 6: ^-minimization algorithm on the model y = Gx + w with G an 80 x 200 Tocplitz Gaussian matrix 
ensemble. The filtered process x(n) is obtained by filtering a 8 sparse spike train through a third-order AR 
process with poles ol\ = 0.9, a 2 = 0.5 and a.3 = 0.2. The measurements were contaminated with zero mean 
Gaussian noise with variance 0.1. 



Gaussian sensing matrix, n=200,m=50,p=2 Bernoulli sensing matrix, n=200,m=50,p=2 




# fraction of non-zeros entries in u # fraction of non-zeros entries in u 

(a) (b) 

Figure 7: l\ -minimization algorithm on the model y = Gx with G an 50 x 200 Toeplitz matrix with 
independent Gaussian or Bernoulli entries. In this experiment x(n) is a second-order AR process with poles 
Q'i = 0.9 and a 2 = 0.5; (a) success rate when G is Gaussian Af(0, 1); (b) success rate when G is Bernoulli 
±1. 

7 Numerical Experiments 

We present simulations for some interesting cases. Theorem [I] asserts that as long as RIP is satisfied, 
stability assumptions on H hold, and the spikes are well separated, our ^-minimization algorithm 
reconstructs the AR process correctly. For general IID Gaussian or Bernoulli matrix ensemble (not 
Toeplitz), it is well known that [9] m > 0(S\og(n/S)) ensures good RIP property. However, for 
our specific Toeplitz structured sensing matrix (Equation [2]) , this question (when RIP is satisfied) 
has not been fully answered. 

We nevertheless experiment with Toeplitz constructions. First we simulate our algorithm for a 
third order process. The results are depicted in Figured! We see that the reconstruction reproduces 
both the spike train as well as the filtered process accurately. For the purpose of depiction we added 
a small amount of noise. 

First, we fix the size of sensing matrix (m = 50, n = 200) and choose the entries of sensing 
matrix G to be Gaussian. We also fix the order of the AR model (p = 2) and let the sparsity 
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Gaussian sensing matrix, total variation model 



Bernoulli sensing matrix, total variation model 




Figure 8: l\ -minimization algorithm on the model y = Gx with G an 50 x 200 Toeplitz matrix with 
independent Gaussian or Bernoulli entries. In this experiment x(n) is total variation process x{n) — x{n— 1) = 
u{n); (a) success rate when G is Gaussian A/"(0, 1); (b) success rate when G is Bernoulli ±1. 



k vary from 1 to 20. For each fixed k, we run our ^-minimization algorithm 50 times to obtain 
the average performance. The result is shown in Figure [7(a). Similarly, we can choose the sensing 
matrix G to be Bernoulli ±1 and do the same experiment again. The result is shown in Figure 
[7](b) . We can see that in this example Toeplitz Bernoulli matrix is more preferable than Toeplitz 
Gaussian matrix. 

Next, we run our algorithm on a case that does not satisfy our assumptions on stability. Specif- 
ically we consider the situation when the true process is governed by the equation x(n) — x(n— 1) = 
u(n). This type of model is closely associated with problems that arise when one is interested in 
minimizing total variations. Note that in this model a = 1 and it does not satisfy the assumptions 
of Theorem [1] where we assume a max < 1- We adopt the same sensing matrix as the last experiment 
(Gaussian or Bernoulli) and the empirical success rate of this experiment is shown in Figure EJ 

Finally we test how the order of the AR process influences the performance of the algorithm. 
In this experiment, we fix the size of the sensing matrix as 80 x 200 and also fix the sparsity 
k = 10 (i.e., the # fraction of nonzero components in z is 5%). We let p (order of the AR process) 
vary from 1 to 15. Figure [9(a) shows that empirical success rate for the Gaussian sensing matrix 
and Figure [9(b) shows that success rate for the Bernoulli sensing matrix. We can see that again 
Bernoulli Toeplitz matrix outperforms the Gaussian Toeplitz matrix. 
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Gaussian sensing matrix, n=200,m=80,k=10 



Bernoulli sensing matrix, n=200,m=80,k=10 




Figure 9: l\ -minimization algorithm on the model y = Gx with G an 80 x 200 Toeplitz matrix with 
independent Gaussian or Bernoulli entries. In this experiment the order of the AR process x(n) is a variable, 
i.e., p € [1, 15]; (a) success rate when G is Gaussian Af(0, 1); (b) success rate when G is Bernoulli ±1. 
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Abstract 

Suppose the signal x € R" is realized by driving a k-sparsc signal u € R" through an 
arbitrary unknown stable discrete-linear time invariant system H, namely, x(t) = (h * u)(t), 
where h(-) is the impulse response of the operator H. Is x(-) compressible in the conventional 
sense of compressed sensing? Namely, can x(t) be reconstructed from small set of measurements 
obtained through suitable random projections? For the case when the unknown system H is 
auto-regressive (i.e. all pole) of a known order it turns out that x can indeed be reconstructed 
from 0(fclog(n)) measurements. We develop a novel LP optimization algorithm and show that 
both the unknown filter H and the sparse input z can be reliably estimated. 

1 Introduction 

In this paper we focus on blind de-convolution problems for filtered sparse processes. These types of 
processes naturally arise in reflection seismology [?] . The LTI system H is commonly referred to as 
the wavelet, which can be unknown, and serves as the input signal. This input signal passes through 
the different layers of earth and the reflected signal z corresponds to the reflection coefficients from 
the different layers. The signal z is typically sparse. The reflected output, which is referred to as 
the seismic trace, is recorded by a geophone. Other applications of filtered sparse processes include 
nuclear radiation [?], neuronal spike trains [?] and communications [?]. 

Specifically, a sparse input u{t) is filtered by an unknown infinite impulse response (IIR) discrete 
time stable linear filter H and the resulting output 

x(t) = (Hu){t) = J2u(n)h(t - n) 

i 

is measured in Gaussian noise, namely, y(t) = x(t) + n(t) for t = 0, 1, . . . , N. The goal is to detect 
z(t), and estimate the filter H. The main approach heretofore proposed for blind de-convolution 
involves heuristic iterative block decomposition schemes (first proposed in [?]). Here the filter and 
sparse inputs are alternatively estimated by holding one of them constant. While these algorithms 
can work in some cases, no systematic performance guarantees currently exist. We explore a convex 
optimization framework for blind de-convolution. 

"This research was supported by NSF CAREER award ECS 0449194 
' Author Names Appear in Alphabetical Order 
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In addition we consider the compressed sensing problem, namely, x(t) is compressed by means of 
a random Gaussian filter ensemble, as described in Figure ?? and the resulting output is measured 
noisily. Analogously, we can consider a random excitation model as in Figure ??. Our task is to 
detect z(t) and estimate H. Our goal is to characterize the minimum number of random samples 
required for accurate detection and estimation. 
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Figure 1: Compressed blind de-convolution. 
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Figure 2: Estimation of filtered sparse process: Random Excitation. 



1.1 Comparison with Compressed Sensing 

Note that this is significantly different from the standard Compressed sensing(CS) [?, ?] problem. 
In standard CS we have a signal or image, x S M. n , which is sparse in some transform domain. 
Specifically, there is a known orthonormal matrix H such that the transformed signal z = H T x is 
k-sparse, namely, has fewer than k non-zero components 1 . A matrix G 6 jj mxn then maps x to 
measurements y = Gx = GHu. For suitable choices of matrices G, such as those satisfying the so 
called Restricted Isometry Property (RIP), the k-sparse signal z can be recovered with 0(k\og(n)) 
measurements as a solution to a convex optimization problem: 

min||n||i subject to y = GHu 

This result holds for all sparsity levels k < an, a < I, for sufficiently small a. There has been 
significant effort in CS in recent years leading to various generalizations of this fundamental re- 
sult. This includes the case when the signal x is approximately sparse (see [?, ?]) and when the 
measurements are noisy, i.e., y = GHu + e (see [?]). 

This paper is a significant extension of CS to cases where H is not only not orthonormal but 
also arbitrary and unknown. Specifically, H, is a causal discrete linear time invariant system (LTI) 
with an unknown impulse response function h(-) as described above. A typical signal x is neither 
sparse nor approximately sparse as we will see in Section ??. 

1.2 Our Approach 

Our CS problem (schematically shown in Figures ?? ??) boils down to determining whether there 
is a sampling operator G with 0(k log(n)) samples such that the signal x can be recovered uniquely 

1 This is often referred to as transform domain sparsity. 
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from the samples y = Gx = GHu using a convex optimization algorithm. It turns out that this is 
indeed the case when H is belongs to the set of stable finite dimensional AR processes of a known 
order. 

At first glance the problem as posed appears difficult. For one there is no reason GH satisfies 
isometry property when H is not orthonormal. To build intuition we describe a practically relevant 
problem. A specific example is when a; is a one-dimensional piecewise constant signal. Such a signal 
is not sparse but does have a sparse derivative, namely, u(t) = x(t) — x(t — 1) is sparse. Clearly, 
the signal x can represented as an output of an (integral) operator H acting on a sparse input u, 
namely, x = Hu. However, H is no longer orthonormal. To account for this scenario one usually 
minimizes the total variation (TV) of the signal. A compressed sensing perspective for this case 
has already been developed [?] . 

We develop an alternative approach here. Suppose we now filter x through an LTI system G 
whose impulse response is g(t). Mathematically, we have, 

y(t) = (g*x)(t) = (g*h* u)(t) = ((g * h) * u)(t) = (h * g * u)(t) 

Since, the composite system g * h is LTI we have that, 

z(t) = y(t) -y(t-l)=g* (x(t) - x{t - 1)) = (g * u)(t) 

Now we are in the familiar situation of z = Gu of the standard CS problem, except that G is a 
Toeplitz matrix. Consequently, if the Toeplitz matrix G satisfies the RIP property we can recover 
z using standard tools in CS. Indeed RIP properties of Toeplitz matrices have been studied [?]. 
Note that this idea generalizes to arbitrary but known finite dimensional stable LTI systems, H. 
The main idea being used here is the commutative property of convolutions. 

However, the question arises as to how to deal with unknown system HI It turns out that 
corresponding to every finite dimensional LTI system there is an annihilating filter [?]. If H is a 
pth order linear dynamical system it turns out that the annihilating filter, H 1 - is parameterized by p 
parameters. Now employing commutativity of convolution, namely, g*h = h*g, followed by filtering 
through the annihilator we are left with a linear characterization of the measurement equations. 
We are now in a position to pose a related i\ optimization problem where the parameters are the 
sparse signal z as well as the parameters governing the annihilating filter. Our proof techniques 
are based on duality theory. 

Strictly speaking, for AR models commutativity is not necessary. Indeed, we could consider 
general random projections, but this comes at a cost of increasing the number of measurements 
as we will see later. On the other hand RIP properties for random projections is (provably) 
significantly stronger than Toeplitz matrices. Nevertheless note that in the random excitation 
scenario of Figure ??, the structure does not lend itself to a random projection interpretation. For 
these reasons we consider both constructions in the paper. 

The paper is organized as follows. The mathematical formulation of the problem is presented 
in Section ??. Section ?? describes the new i\ minimization algorithm. The result for recovery 
with AR filtered processes (Theorem ??) is stated in this section. The proof of Theorem ?? can 
be found in Section ??. To help the reader understand the main idea of the proof we first consider 
a very simple case and Section ?? provides the proof for the general case. Section ?? addresses 
the blind-deconvolution problem, which can be regarded as a noisy version of our problem. We 
use LASSO to solve this problem and the detailed proof is provided in Section ??. In Section ??, 
we extend the our techniques to two related problems, namely, decoding of ARM A process and 
decoding of a non-causal AR process. Finally, simulation results are shown in Section ??. 
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2 Problem Set-up 



Our objective is to reconstruct an autoregressive (AR) process x(t) from a number of linear and 
non-adaptive measurements. An autoregressive model is known as an "all-pole" model, and has 
the general form 

p 

x(n) + '^2aix(t — i) = z(t) (1) 

i=l 

where z(t) is a sparse driving process. We assume the vector z = [zq, • • • , z n -i] T is /c-sparse, that is, 
there are only k non-zero components in z. The task of compressed sensing is to find the AR model 
coefficients a = [ai, • • • , a p ] T and the driving process z = [zq, • • • , z n -i] T from the measurement y. 
In this paper, we assume that the AR process x(t) is stable, that is, the magnitude of all the poles 
of the system is strictly smaller than 1. In later discussion, we use x% or x(t) interchangeably for 
convenience of exposition. 

Note that in standard CS setup, the signal x is assumed to be sparse in some known transform 
space. However, in our problem, the AR model is assumed to be unknown and the main contribution 
of this paper is to solve this new problem efficiently. 

We consider two types of compressed sensing scenarios: 

2.1 Toeplitz Matrices 

Here we realize m measurements by applying the sensing matrix G to signal x — [^o? " " " ■> ^n— l] 
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(2) 



where each entry is independent Gaussian random variable AA(0, 1) or independent Bernoulli ±1 
entries. Here the Toeplitz matrix G preserves the shift-structure of the signal. Roughly speaking, 
assume z' is a shifted version of z (disregarding the boundary effect), then Gz' is also just a shifted 
version of Gz. This is particularly suitable for the random excitation model of Figure ??. 

For notational purposes we denote by a;W (or GW) to denote the subvector of x (or submatrix 
of G) that is composed of the last s components (or s rows) of x (or G). By rearranging the above 
Equation ?? and using the shift-property of G, we have the following equation. 
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where we recall that z = [zq, • • • , z n -i] T . Now Equation ?? is simplified to 

Ya + y [m - p] = G [m ~ p] z (4) 
where a = [a±, ■ ■ ■ , a p ] T G MP and z £ M. n (/c-sparse) need to be decoded from the model. 
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2.2 Random Projections 

Here we consider randomly projecting the raw measurements x(t), namely, 

n-l 

y(t) = ^2 9t,rx(r), t = 0, 1, . . . , m 

where, each entry g^ T is an independent Gaussian random variable A/"(0, 1) or independent Bernoulli 
±1 entry. The reason for choosing random projections over random filters is that IID random 
Gaussian/Bernoulli matrix ensembles have superior RIP constants. The optimal RIP constants for 
toeplitz constructions has not been fully answered. Nevertheless, note that to form the matrix Y 
with random projections requires significantly more projections. This is because we can no longer 
exploit the shift-invariant property of convolutions. For instance, consider again the matrix Y of 
Equation ?? above: if random projections were employed instead of Toeplitz construction the entry 
?/i on row 1 will not be equal to the entry y\ in the second row. This means that for a pth order 
model we will require m x p measurements. 

Notation: To avoid any confusion, we use it* to denote the true spike train and u refers to any 
possible solution in the decoding algorithm. Similarly, a* represents the true coefficients. 



3 ^-minimization Algorithm for AR Models 

Since the AR model is unknown, standard decoding algorithms (e.g., Basis Pursuit [?], OMP [?], 
Lasso [?], etc.) can not be directly applied to this problem. However, we can regard the signal 
(it, a) (the original signal u together with the unknown coefficients a) as the new input to the model 
and (it, a) is still sparse if p (the length of a) is small. 

With this in mind we solve the following l\ minimization algorithm 

min llulli subject to Ya + y [m_p] = G^^u (5) 

More generally, when the measurement y is contaminated by noise, that is, the sensing model 
becomes y = Gx + w where w is Gaussian noise, the above LP algorithm will be replaced by Lasso, 

min -\\Ya + y{ m -rt - G^dlS + Alldli (6) 

where A is a tuning parameter that adapts to the noise level. 

Alternatively, the coefficient a can be solved from Equation ?? by taking pseudo-inverse of Y, 

a = (y T y)- 1 y T (G [m - p] u - y ^ m ~^ (7) 

Then Equation ?? becomes 

(/ - Y{Y T Y)~ 1 Y T )y^ n -^ = {i- y(y T y)- 1 y T )G [m - p] u 

and similar to Equation ?? we can apply the following l\ minimization to find the solution for u. 

min llulli subject to p y l m ~p] = PG [m - p] u (8) 
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where P denotes the projection matrix / — Y(Y T Y)~ l Y T and ||it||i denotes the t\ norm of u. Sup- 
pose the solution of Equation ?? is u. Then a can be easily derived by a = (Y T Y)~ 1 Y T (G^ m ~ p ^u — yl m ~ ; 
and the signal x(n) can be recovered through Equation ??. 

We note that Equation ?? is equivalent to Equation ?? if Y T Y is invertible, which is always 
assumed to be true in this paper. To summarize the above discussion, our algorithm is summarized 
below. 

(1) Inputs: Measurement y, sensing matrix G and order of the system p. 

(2) Compute u and a: Solve the i\ minimization (Equation ?? or ??) or Lasso (Equation ??). 

(3) Reconstruction: Recover the signal x(n) through forward propagation of the AR model of 
Equation ??. 

Before stating the main result, we recall that for every integer S the restricted isometry constant 
[?, ?], 5s is defined to be the smallest quantity such that G^ 1 p ^ obeys 

(1 " S s )\\x\\l < \\G [ ?- p] x\\l < (1 + S s )\\x\\l (9) 

for all subsets T C {0, 1, • • • , n — 1} of cardinality at most S and all (ijWy. 

Note that when the AR filter a(n) is known the result is a direct application of standard 
compressed sensing results. We state this without proof below for the sake of completion. In other 
words, if the coefficients a(n) are known, u*(-) is the true driving process in Equation ?? then u*(-) 
is the unique minimizer of 

min ||u||i subject to Ya + y^ 1 '^ = G [rn ~ p] u (10) 

A main result of our paper is the following where a(n) is assumed to be unknown. We need the 
following assumptions before we state our the theorem. 

1. Constant Order: We assume that p, the order of AR process x(n), is a constant (i.e., p 
does not scale with n,m,S). 

2. Exponential Decay: Suppose the impulse response \h(i)\ of the AR model satisfies 

\h(i)\ < Mp { 

for some constant M and < p < 1. 

3. Distance between Spikes: We define the constant I := (log(j^) + plog( 6 ^ axM )^ /log(p _1 )+ 

p and impose the condition that any two spikes, u*, u* satisfy \i — j\ > I, i 7^ j. This implies 
that the sparsity k := |Supp(n*)| < mm{S/l, S/3}. 

4. Spike Amplitude: We also assume that any spike is bounded, /3 m in < \uk\ < Anax, VA; £ 
Supp(-u*). 

Theorem 1. Suppose assumptions 1-4 above are satisfied. Let the integer S satisfy y~§s~ < 1- V 
u*(-) is the true driving process in Equation ?? then it is the unique minimizer of 

min \\u\\i subject to Ya + y^ m ^ = G^'^n (11) 
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Intuitively speaking, the condition in the theorem requires that the driving process u(n) is 
sparse enough and any two spikes (ui,Uj) are reasonably far away from each other. This type of 
assumption is actually also necessary. In section ??, we give an example where two spikes are 
consecutive and show that in this case x(n) can not be solved via equation ??. The proof of 
Theorem ?? is presented in Section ??. 

Remark 3.1. The reader might be curious as to whether a random convolution train provides 
benefits over random projection. Note that by using random convolutions we can naturally exploit 
shift-invariance property. Since Y E K m_ P x P as in Equation ?? is a partial Toeplitz matrix, we 
only need m output measurements. In contrast for a random projection, since we can no longer 
exploit this property, we would require 0{mp) measurements. 

3.1 Noisy Blind-deconvolution 

We consider the noisy blind-deconvolution problem with IID Gaussian noise, Wi ~ 7V(0,cj 2 ), and 
measurements 



(12) 

u{n). In this section we consider 



y(n) = x{n) + w(n) 

where the process x(n) is modeled by x(n) + X^=i a i x ( n ~ 5 
the problem of reconstructing the sparse spike train u(n) and coefficients a from the observed 
signals y{n). This problem is called "Blind deconvolution" [?, ?] and it is a simplified version of 
the Compressed Sensing problem where the sensing matrix G is identity matrix. To the best of our 
knowledge, even this simplified problem is still not completely solved in literature. Therefore, we 
focus on the uncompressed noisy version here. The noisy compressed version is technically more 
involved and will be reported elsewhere. 

Replacing x(n) with y(n) — w(n) in the AR model, we get 



l( n ) + J }2a i y( 



n — i) 



u(n) + e(n) 



(13) 



where we denote e(n) := w(n) + Yli=i a i w i n ~ *)■ 
Again by introducing 





Y 



yo 



Vp-i 



Vn-2 ■ ■ ■ Vn-p, 

we have the matrix-form system model 

y + Ya = u + e 
Here Lasso is used to solve the problem: 

1 2 

min - \\y + Ya — u U + A kt h 

«eR n ,aeRp 2 



(14) 



(15) 



7 



X 



We can show that the solution of Lasso is very close to the true a* and u* . Before stating the 
theorem, we first introduce some notation and technical conditions that will be used in the proof. 
We denote the noiseless version of Y as 

••• 

Xq ■■■ 

Xp-i ■■■ x 

Denote the support of u* as /. We define X\ as the matrix comprising of the rows of X indexed 
by / and X^ as the matrix comprising of the rows of X indexed by I c . We also denote x max = 
max; \xi\, u min = mim 6 / \m\ and a max = max; |dj|. 

We assume that the AR process x(n) satisfies the following set of conditions. 



(1) The smallest eigenvalue \ m i n (X 2 T X2) > > 

(2) HXfsgn^DIU < HarllzVKgH, 

2 

(3) x max > 2frVlog n and ^ff < min{ : 



Anpa 2 
(^2-l) 2 



for some constant c > 1. 



Acy/2pn ' y 24cpy'log n 



}• 



In practice, condition (1) is generally satisfied. For instance, if the signal x is persistent, 
TT^jT^Xj X2 converges to a constant invertible matrix. Condition (3) is also standard in compressed 
sensing, which says we need SNR > O(logra). In addition, we also need the assumption that no 
components are dominantly large (compared with the total energy of x). The upper bound for 
^max/H^lb can be relaxed but the current setup simplifies the analysis. 

Condition (2) is new. Let us consider two scenarios. In the first scenario, each spike in ui 
can be either positive or negative with equal probability (i.e. sgn(u/) is Bernoulli ±1). In this 
case, X^sgn(uj) behaves like a sub-Gaussian sum and it is usually upper bounded by ||x||2\/Iogn 
with high probability. On the other hand, let us also consider the case when all the spikes in ui 
are of the same sign, say positive. In this case each entry in Xf and sgn(n}) is positive and the 
inner product of these two aligned signals is typically much larger than the first scenario. This 
phenomena is also illustrated in the experiments shown in Figure ??. In the experiment, the AR 
model is Xt — lAxt-i + 0.45xt_2 = u(t). The blue curve corresponds to the scenario when sgn(tij) 
(uj is a spike) is Bernoulli ±1. The red curve corresponds to the case when the sign of any spike 
Ui is always +1. Each point on the curve is an average over 40 trials. We can see that in the 
first scenario (blue curve) we can tolerate many more spikes. To the best of our knowledge, this 
behavior does not exist in standard compressed sensing problem. 

Theorem 2. Denote P := I — Y(Y T Y)~ 1 Y T and assume condition (1),(2) and (3) stated above 
are satisfied. We also assume parameter A is chosen such that A > 6apa mauX ^\og n and u m i n > 2A, 
the solution to Lasso ?? is given by 



Uj 

a 



(PfP / )- 1 (Pfe - Xsgniu})) + u\ 



iY T Y)^Y T (y 



u 



(16) 
(17) 
(18) 



S 



Bernoulli-Gaussian Process, 2-order AR model, SNR = 28dB 



sign of Z is random 




sparsity k/n 

Figure 3: Comparison of two sign conditions for u. The AR model is xt — l.Axt-x + 0.45:Ef_2 = u(t). Noises 
ere added to the measurements and SNR = 28 dB. In one experiment, each sign of each spike is either 
positive or negative with equal probability. In the other experiment, the sign of the spikes is always positive. 

and we have sgn(u*) = sgn(u) with probability at least 1 — 8p/n — (p + l)2 _Tl//5 . 

Remark: The assumption u m i n > 2A implicitly implies an SNR bound O(logn) for the smallest 
spike. The assumption A > 6<rpa max ^/log n ensures A to be sufficiently large so that every non-spike 
element is shrunk to zero by the Lasso estimator. It is hard to analyze the case when parameter 
A is smaller because in this case it is not clear how to construct ujc which is critical for tractable 
KKT analysis. The choice of u in the Theorem ?? is motivated by the proof techniques used in [?]. 
The proof of Theorem ?? is presented in Section ??. 

4 Extensions 

In this section, we provide two interesting extensions to the AR model problem. First, we generalize 
AR model to the autoregressive moving average (ARMA) model, i.e., the process contains both 
poles and zeros in the transform function. Second, we develop an algorithm for the non-causal AR 
process, i.e., the current state not only depends on the past inputs but also depends on the future 
inputs. 

4.1 ARMA model 

The ARMA model takes the form 

P Q 

x ( n ) + ^2 dix(n — i) = u(n) + biu(n — i) (19) 
i=l i=l 

Again we use Equation ?? to obtain the measurement y = Gx where G is a Toeplitz matrix as 
defined in Section ??. Similar to what we have done in Section ??, we write down the matrix 
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representation of the ARMA model: 



20 

21 2 



Xp Xp—l • • • Xq 



1 

r/i 



JEn— 1 3?n— 2 ' ' ' 3Cn— p_ 

We denote the lower triangular matrix B as 



1 

h 1 

6 g • • • bi 

• • • 6, 



• • • 6i 1 



^n-2 



(20) 



J3 := 



1 
6i 1 



(21) 



By multiplying G^ m p l to both sides of Equation ??, we get 



(22) 



Note that for ARMA model we have an additional term B compared to Equation ??. Generally, 
matrix B is unknown. We first consider a simple situation when B is assumed to be known to the 
decoder. Based on Theorem ?? we can derive the following result. . 

Theorem 3 (Known Zero Locations). Given the same technical conditions as Theorem ?? and 
assume u* is the original sparse spike train that generates the ARMA process. Then u* is the 
unique minimizer of 



mm 



\i subject to Ya + y[ m " p ] = G [m ~ p] B 



u 



(23) 



Proof. Note that B is also a Toeplitz matrix. From the commutativity of Toeplitz matrix, we have 
Q[m- P ] B = ££[™-p1. Prom Section ??, the KKT conditions claim that u* is the unique minimizer 
of Equation ?? if and only if there exists a vector ir such that: 

1. (7r T G^ m ^B) . = sgn«) for all % G Supp(u*), 

2. | {t^G^-^B). I < 1 for all j Supp(u*), 

3. tt t Y = 0. 

Applying the commutativity and define tt t = tt t B, the above three conditions are converted to 

1. (Tt T G^ n -P ] ) i = sgn(n*) for all * G Supp(u*), 

2. | (tFG^-p^) . | < 1 for all j g" Supp(u*), 
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3. % T B~ X Y = 0. 



Note that both the inverse B~ 1 and the matrix Y are Toeplitz. Therefore, from commutativity, 
the third equation is equivalent to ^YB" 1 = 0. Finally, since B~ l is invertible, the last equation 
can be further simplified to tt t Y = 0. Now the KKT conditions look exactly the same as those in 
Section ??. Hence the corollary is proved by following the same argument as in Section ??. □ 

Now we consider the general situation when B is unknown. The difficulty of decoding lies in 
the fact that we know neither B nor the spike train u(n). There might exist different combinations 
of hi and u(n) that matches the measurements y(n). 

Here we propose an iterative algorithm for estimating (u,a,b) in Equation ??. Each iteration 
comprises of two basic steps. First, if B is known (from previous iteration), we can use the following 
t\ minimization algorithm to solve u and a (Theorem ??). 



min Hi s.t. \\Ya + y [m ~ p] - G [m - p] Bu\\ 2 < e 



(24) 



Here e > is required, even though there may not be any noise, to ensure that we do not get stuck 
in a local minima. 

Now once u is determined we switch from u to B, as the optimization variable. This problem 
reduces to a standard regression problem. First we rewrite Equation ?? as follows: 



Ya + y lm ~ p] = G [m - p] 



u 










"1" 


Ui 


Uq 















h 












U q 


U q -i ■ ■ 


u 














A. 


Un-l 


U n -2 ■ ■ 


U-n— q— 1_ 





which can be simplified to Ya + g/ m p l = G^ m p ^u + G^ m p ^Ub where we denote 



U 





»o 

Uq-l 



UQ 



Un-2 • • • U n -q-l_ 

Now we formulate the following least squares optimization problem: 

\Ya + y [m ~ p] - G [m ~ p] u - G [m ~ p] Ub\\ 2 



mm 



(25) 



In summary our iterative algorithm consists of the following steps: 
Initialization: Set &(°) = 0, i.e., B^ = I. 
Iteration k: Compute , , b^ 
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0.6 - 
0.4 - 
0.2 - 



step 30 



step 20 



e % 10 



step 60 



0.1 0.2 0.3 0.4 0.6 0.6 0.7 

b. 




Figure 4: The iterative algorithm on the model x(n) — 1.9x(n — 1) + 1.06a;(n — 2) — 0.144a;(n — 3) = 
u(n) + 0.7u(n — 1) + u(n — 2) where the correct b = [0.7 1]. Left: In trajectory of b in each round of 
iteration; Right: Zoom-in of the final stages of the iterations. Blue * corresponds to the rounds of updates 
with e = 3 while red * corresponds to the rounds of updates with a smaller e = 0.3 in the final stage. 

1. Update and via solving Equation ?? with B = B^ k ~^\ 

2. Update via solving least-square (Equation ??) with (it, a) = {u^,a^). 

There is a subtlety in the choice of parameter e in Equation ??. If e is large, the iterative 
algorithm appears to have a faster convergence rate but at the cost of significant bias. On the 
other hand, if e is small, the convergence rate is slow but the solution has small bias. Therefore, 
in practical implementation we choose e to be reasonably large in the early stages of the iteration 
and then decrease it to e/10 at the later stages of the iteration. 

Figure ?? illustrates a concrete example of solving the ARMA model x(n) — 1.9x(n — 1) + 
1.06x(n — 2) — 0.144x(n — 3) = u(n) + 0.7u(n — 1) + u(n — 2) by using our iterative algorithm. 
We choose e = 3 in the first 50 rounds of iteration and finally in the last 10 rounds of updates we 
set e = 0.3. Figure ??(b) is a zoom-in version of Figure ??(a) which shows the final stage of the 
algorithm. We can see the effects of choosing different value of e as well. 



4.2 Non-causal AR model 

Many real world signals are non-causal. For example, a 2D image is usually modeled by a Markov 
random field, where each pixel is dependent on all its neighboring pixels. In this subsection we 
consider this situation by modeling the signal to be a non-causal AR process. 
A non-causal AR model is defined as 

p p 

x{n) + aix(n — i) + a-ix{n + i) = u{n) (26) 

i=l i=l 

A typical non-causal AR process is shown in Figure ??. Here the impulse response of each spike 
is two-sided as opposed to the one-sided impulse response of causal AR process. In this subsection, 
we discriminate between two boundary conditions for the non-causal AR process. As we will show 
later, there are subtle differences in dealing with these two boundary conditions. 
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A non-causal AR process 




200 



Figure 5: A typical non-causal Autorcgrcssive process: x(n) — 0.375a;(n — 1) — 0.5x(n + 1) = u(n) 



1. Boundary is circulant, i.e., xq = x n , x\ = x n +i, • • • ; 

2. Boundary is not circulant. 

4.2.1 Circulant Boundary 

In this case we use the following circulant matrix in the sensing model y = Gx. 



G 



9n-m 9n-m-l " ' 90 9n-l 9n-2 " " " 9n-m+l 
9n-m+l 9n-m ' ' ' 9l 90 9n-l ' ' ' 9n-m 



9n-l 9n-2 



9m-l 9m-2 9m-3 



9o 



(27) 



where gi is i.i.d Gaussian random variable A/"(0, 1) or Bernoulli ±1 random variable. 

Since the boundary of x is circulant = x n -i), we can write the matrix representation of 
Equation ?? as 



Xq 


•En—1 


Xn—p 


x\ 


Xp 




1 " 

Ol 






X\ 


xo 


Xn—p+1 


X2 


■ x p+ i 






Ui 


Xp 


Xp—l 


X 


Xp+\ ■ 


X2p 




dp 
a-i 




Up 


Xn—1 


X n -2 ■ ■ 


Xn—p 


XQ 


■ X p -l_ 




a- p 




Un-l_ 



(28) 



With an abuse of notation, we use to denote the submatrix of G comprising rows i-th. 

through j-th. of G. Now we multiply Q[p+ l '- m -p\ to both sides of Equation ?? we get the following 
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equation. 



Hp Vp-i 
Vp+l Vy 

V2p V2p-1 

JJn—p Un—p—1 

We define matrix Y to be 
Y = 



yo 
yi 



y P +i 

Vp+2 



Vp V2p+1 



Vn-2p Vn-p+l 



V2p 
V2p+1 

Vn 



1 

a i 



Qlp+l-.m-p]^ 



Vp-x 

y v 



yo y P +i 
yx y P +2 



y2 P -x ■■■ y P V2p+x 



y2 P 
y2 P +x 

y3 P 



JJu—p—X ' ' ' y-n—2p y-n—p+X ' ' ' yn 

and finally Equation ?? is simplified to 

Jp+X:m-p] j r Y a = Q[p+X:m-p] u 



where a = [a±, ■ ■ ■ ,a p , a_i, • • • , a_ p ] T € 

As in Section ?? we can use either -^-minimization or Lasso to solve this problem. 



-minimization: min 

u€R™ a> 



|i s.t. y[P+ 1:m "P] + Ya = G lp+1:m ~ p] i 



(29) 



(30) 



Lasso: min l\\yb+ lim ^P] +Ya- G [p+V " ] uf 2 + Alltilli 
4.2.2 Non-circulant Boundary 

The case of non-circulant boundary is slightly more complicated. There are two ways of handling 
this situation. A simple approach is to view the problem as a perturbation of the circulant boundary 
case, namely, 

y ]p+X:m- P ] + fa + e = G lp+1:m ~ p] u 

where 



Q\p+X:m—p] 



%—l %n— 1 









x 



n—p 



p+X •^n—p+X 











•Era -^0 






-2 - Xp-2 
%n+p— X %p— X 



u n+p- 



Now one could use Lasso to solve this noisy model: 



mm 

«eM n ,a6R 2 P 2 



ly\p+X:m-p] +Ya - Gb> +Um -rtu\\l + A||u||i 
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Unfortunately, this approach will have a bias. To overcome this limitation, we consider the case 
where we can make an additional 2p set of measurements corresponding to the boundary conditions, 
namely, 



Um+l — X— p, 

Then by the denoting 



j n—pi j 



Vm+2p — ZCn—l- 



Y ~ Y + Q[p+ 1 - m -p] 












^n—p 



•E—p+1 -En—p+1 











•En -^0 



%n+p— 2 %p— 2 
Xn+p—1 %p— 1 



the sensing model can be simplified to the noiseless version 

y\p+l:m,-p] _|_ Y a = Q\P+~L'-m-p] u 

Again we can use either ^-minimization or Lasso to solve this model: 

^-minimization: min lluL s.t. y^+ v - m -^ + Ya = G [p+1:m " p] n 

Lasso: min I|| y b+ 1:m -p] + Ya - G^ +v - m -^u\\l + A|lul|i 

neK n ,aeIR 2 P 2 

5 Proof of Theorem ?? 

We first write down the primal and dual formulation of algorithm ??. 

min |H|i subject to Y a + = G [m ~ p] u 

whose dual formualtion is: 

max 7r T y[ m - p ] subject to \W T G [m - p] IU < 1, tt t Y = 

7reM m 



(31) 



(32) 



The proof is based on duality, u* is the unique minimizer of the primal problem ?? if we can 
find a dual vector ir with the following properties: 

1. (7r T G^ m -^) . = sgn(u*) for all i G Supp(u*), 

2. | (7r r Gfl m -rt) . | < 1 for all j Supp(n*), 

3. tt t Y = 0. 

where sgn(u*) denotes the sign of u* (sgn(u*) = for u* = 0) and Supp(u*) denotes the support of 
vector u* . The above set of conditions ensure that the primal-dual pair (u* ,tt) is not only feasible 
but also satisfy the complementary slackness condition, thus optimal. We call the above three 
conditions as the Dual Optimal Condition (DOC). 

The rest of this section is to construct a tt that satisfies the DOC. Our construction relies on 
the following result (see [?]). 
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Lemma 4 ([?]). Let S > 1 be such that 62s < ^, and c be a real vector supported on T obeying 
\T\ < S. Then there exists a vector it G K m such that (ir T G^ m ~ p ^) . = qVi € T. Furthermore, ir 
obeys 



< 



S S 



(1 - 35 2S )VS 



\ch Vj^T 



This lemma gives us the freedom to choose (arbitrarily) the value of ir T G^ m p l in the location 
of T while the magnitude of the rest components is still bounded. 



5.1 One Pole Case 

In this section we provide a proof for the simple case when x(n) is a first order AR process (i.e., 
p = 1) and u* only contains one spike (i.e., every entry of u* is zero except one place). Though 
simple, it contains the main idea of proof techniques for the more general case. Note that in this 
simple case the assumptions in Theorem ?? are automatically satisfied. 

For the 1-sparse driving process u* , without loss of any generality we assume uj$ = 1 and 
u* = (Vi > 1). We also denote a = — a as the root of the characteristic function of the first order 
AR process. Due to stability we have \a\ < 1. Now in condition 3 of DOC, the term tt t Y can be 
recast as 



tt t Y 



7T 



T Q[m—p] 











x 


= TT T G [m - p] 


1 


Xn-2_ 







In Lemma ??, we choose c as cq = 1, c\ = 1/2 and Cj = (j = 2, ■ ■ ■ , S — 1). Then Lemma ?? 
tells us that there exists a ix\ such that {k\ G^ m ~ p ^) . = q (V2 = 0, • • • ,S — 1) and furthermore 



< 



S s 



This implies 



vrf Y = vrf G^ 




1 



a" 



1 



n-l 
3=5 



^TQi-m-p] 



a- 



3-1 



(33) 



where the summation 



< 



summarize the above discussion, we find tt\ such that: 

1. (ttJG^) q = 1 

2. I (vrf G^ 1 '^). I < 1 for all j > 1, 

3. sgn(vrf Y) = 1. 



2|a|* 



V^(l-|a|) 



<C i. Therefore sgn(-7rfY) = 1. To 
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Similarly, by choosing cq = 1, c\ = — 1/2 and Cj = (j = 2, • • • ,5 — 1) in Lemma ??, there 
exists a 7T2 such that condition 1 and 2 of DOC are also satisfied while sgn(7r^Y) = — 1. Hence, by 
convexity there exists a A G (0, 1) such that for 7r = A7Ti + (1 — X)tt2, it satisfies 7r T Y = and also 
condition 1 and 2, i.e., the whole DOC. 

Finally we find a primal-dual pair (u* , ir) that satisfy all the feasible constraints and also the 
complementary slackness condition, which implies u* is the unique minimizer of the primal problem 
equation ??. 



5.2 General Case 

In this section we prove that in the general case the three conditions in Theorem ?? ensures the 
existence of a ir that satisfies the DOC. Before giving the proof, we point out that if some conditions 
in Theorem ?? are violated, there might not exist such a ir. Let us consider the case of p = 1 (first 
order AR process) and k=2 (only two entries of u(n) are nonzero). Moreover, we choose Uq = u\ = 1 
and u* = (Vi > 1), that is, the two spikes are next to each other. 

In this case [xq, x\, • • • , x n _i] T = [1,1 + a, a(l + a), • • • , a n ~ 2 (l + a)] T . We pick a = —1/2. 
Clearly the assumption \i — i'\ > Z, Vi, i' € Supp(n*) in Theorem ?? is broken. On the other hand, 
we can also check that there does not exist a it that satisfies the whole DOC condition. In fact, 
suppose 7r is chosen such that condition 1 and 2 are satisfied, then in checking condition 3 we find 



vr T Y = 7r T G^ 




1 

1 + a 



a 



n— 3/ 



l + a) 



n-l 



71- 1 



'[m— p] 



3=2 



j ~ 2 (l + a) > 1 - |«r 2 (l - |a|) > 

3=2 



which violates condition 3 in DOC. Hence there does not exist a ir that satisfies all the three 
conditions in DOC. 

Before proving Theorem ??, we need the following lemma in constructing ir. 

Lemma 5. Suppose the assumptions in Theorem ?? are satisfied. Denote T = {j + i : j G 
Supp(u*),0 < i < I}. Then \T\ < S and we also have the following inequalities: 



(1) Vj T and i = 0, 1, • • • ,p, < 



l-p 1 



(2) Vfc G U p i=1 {j -i:je Supp(u*)}, \x k \ < ^0 

(3) Vi = 0, 1, • • • ,p — 1 and Vj G Supp(u*), \xj/xj+i\ > r where r := l3ra ^ 1 p )■ — p l . 

Proof. First, from the assumption of Theorem ??, |T| < S. Then we need to verify the three 
properties. 

Suppose Uk is a new spike and k' be the next spike. Given i < (k' — k), we clearly have 



\xh+i\ < /3 max W(l + P l + P 21 + ■■■)< 



1 - p l 
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Hence properties (1) and (2) follow. 

We denote e := ^™** f J ■ Therefore, for any j G Supp(u*), \xj\ > \uj\ — ep l > f3 m i n — ep l . 
Combining with the above argument, we have 

\Xj/x j+i \ > (Anin - ep )/e = — — p 

Pmax-'" 

Note that when p l < 3 ^ m '" M as given in the theorem assumption, we have r > 3 a m[n M ■ Q 

Remark: Property (1) in Lemma ?? says that many components of x(n) are small. Property 
(2) ensures that before a new 'spike' Uj begins (j G Supp(n)), the amplitude of 
already negligible (i.e., very close to zero) such that the new impulse response caused by uj can be 
regarded as starting almost from zero level. Finally, property (3) says that when a new spike Uj 
arrives, the corresponding output Xj is reasonably large compared to its neighbors. 

Now we are ready to prove Theorem ??. Similar to the last section's argument, the objective is 
to find a sequence of vectors ttx, ■ ■ ■ , tt2p such that any of ir s (s = 1, • • • , 2 P ) satisfies the condition 
1 and 2 of DOC while 

sgn(vrf Y) = [1, 1, • • • , 1] T 
sgn(Tr^y) = [-1,1,-.. ,1] T 

sgn(^y) = [-1,-1,- ■■ ,-lf 

and this implies there exists a convex combination ir = X^s=i ^s^s which satisfies ir T Y = and 
also the condition 1 and 2 of DOC. 

Based on Lemma ??, we construct tx\ via fixing the values of {(nf G^ m ~ p ^)i}i & T '■= {ci}i^T- 

sgn(u*) if i G Supp(u*) 

(r/2Y~ j ~ 1 sgn(xj) if i = j + 1, • • • , j + p,Vj G supp(u*) (34) 
if i = j +p + l, ••• + G supp(n*) 

This choice of c gives the bound ||c||2 < y/k + k(l + 2 _1 + 2~ 2 + • • • ) < v3fc- Now by applying 
Lemma ??, we know there exists a tt\ such that (ttJ G^^) . = a when i G T and 



irfG [m - p] 



<J*<^|<1, 



where the last inequality follows from the assumption of the Theorem. Up to now we have shown 
that 7Ti satisfies condition 1 and 2 of DOC. Next we will check the sign of Trf Y. 
Fort = l,2,--- ,p, 

jo+p 

j=jo+l J j£T or jeSupp(u*) J 

io+p / io+t-i 

E E Cj^J-i + IE E W-* + E Cj^J-i 

j 6Supp(n*) j=jo+t \j GSupp(u*) j=jo+l j^T or jeSupp(u*) 
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where the magnitude of At can be lower bounded, 

\A t \ > Y, (r/Zy-^mUl - 2" 1 - 2" 2 2-(P~*)) > fc/W/^ 1 

j eSupp(u*) 

based on property (3) of Lemma ??. And the magnitude of B t is upper bounded, 

|Bt|< S + ' + ' +-) = fe (l- /3t )(l- p) 

ioeSupp(«*) 

When / > ^log(j^) + p log( 6 ft" axA/ )^ / log(p _1 ) + p as given by the assumption of the theorem, we 

have \Bt\ < \At\, which implies that the sign of (TrjY)t is determined by the sign of At. 
Hence sgn((7r^y)f) = sgn(^bj = 1. This implies 

sgn(7rf Y) = [1, 1, • • • , 1] T 

In general, for any sign pattern [s\, • • • , s p ] T (si S {—1, 1}), by choosing {ci}i & T (compare equa- 
tion ??) in the following way 

{sgn(ii*) if i e Supp(u*) 

Si ■ (r/2) i -^- 1 sgn(x i ) if i = j + 1, ■ ■ ■ ,j+p, Vj E supp(u*) 
if i = j + p + 1, • • • , j + I, \/j e supp(u*) 

and making similar arguments, we have 

sgn(yrfy) = [si,s 2) ■ ■ ■ ,s p ] T 

6 Proof of Theorem ?? 

To prove Theorem ??, we only need to check that (u,a) given in the theorem satisfy the KKT 
conditions. We denote the function f(u,a) = ^\\y + Ya — + A||u||i. Then the gradient of / with 
respect to a is 

^- = Y T {y + Ya-u) 
and the subgradient of / with respect to u is 

df 

tt- = - (y + Ya - u) + Xv 
ou 

where v satisfies vi = sgn(uj) for i £ I and \vi\ < 1 for i G I c . Therefore, we only need to check the 
following set of (in)equalities 

Y T (y + Ya-u) = (35) 
(y + Ya-u)i = Asgn(tii), in / (36) 
\{y + Ya-u)i\ < A, in = (37) 

We first check Equation ??. 
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Lemma 6. Equation ?? is satisfied with (u,a) given in Theorem ??. 
Proof. Actually, 



Y 1 (y + Ya-u) = Y 1 (y - u) + Y 1 Yd 



T (y -u) + Y T ] 
Y T (y - u) - Y T Y{Y 1 Y)- l Y 1 \y 

□ 



Next we check Equation ??. 



Lemma 7. Equation ?? is satisfied with (u, a) given in Theorem ?? with probability at least 1 — 
8p/n - (p + l)2" n / 5 . 

Proof. Note that P has the property that P 2 = P and PY = 0. Therefore by multiplying P to 
both sides of Equation ??, we have 

P y = p u * + p e (38) 

Now we can compute 

y + Ya-u = y - u-Y(Y T Y)- 1 Y T {y - u) 



(J 
(6) 



P(y - u) 

Pu* + Pe-Pu 

Pe + P/K-tij) 

Pe - PjiPfPjy 1 (Pfe - AsgnK)) 



where (a) follows from Equation ?? and (b) follows from the fact that n/c = <u^ c = 0. 

There is a small trick here. Since y + Ya — u = P(y — u) as we have shown and P 2 = P, we 
must have P(y + Ya — u) = y + Yd — u. This implies Equation ?? is correct: 

(y + Ya- u)i = Pf{y + Ya-u) 

= Pj {Pe - P^pTPi)" 1 {Pfe - Asgn(^))) 

( = } pj p e - Pfe + Asgn(n^) 
= Asgn(n^) = Asgn(wj) 

where (a) follows from PjP = Pj (i.e., P 2 = P) and the last equality holds true with probability 
at least 1 — 8p/n — (jp + l)2 _n//5 . The proof of last equality is similar to the proof of Lemma ?? 
and is omitted here. □ 

Verifying inequality ?? requires more effort. We first simplifies the formula for (y + Yd — u)ic 

Lemma 8. With {u, a) given in Theorem ??, we have 

(y + Ya- u)jc = -Y 2 (Y 2 T Y 2 )^ 1 Y 1 T A S on«) + (/ - Y 2 (Y 2 T Y 2 )- 1 Y 2 T )e /c (39) 

where we denote Y\ as the submatrix comprises of the rows ofY indexed by I and Y 2 as the submatrix 
comprises of the rows ofY indexed by 1°. 
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Proof. Following from the proof of Lemma ??, we have , 

(y + Ya- u)jc = Pj c (P - P I {FfPi)~ 1 Ff)e + P^P? Pj)" 1 Xsgniu}) 



(40) 



To simplify the above equation, we introduce P\\ G M. kxk as the matrix comprises of the rows 
of P indexed by I and the columns of P indexed by /. Similarly, P\ 2 S K( n_fc ) xfc is the matrix 
comprises of the rows of P indexed by / and the columns of P indexed by I c ; P 22 £ R( n_fc ) x ( n_fc ) 
is the matrix comprises of the rows of P indexed by I c and the columns of P indexed by I c . By 
this definition, after some column and row permutations, P can be rewritten as 
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P 
P T 

r l2 



Pl2 
P22 



It is easy to check that P\\ = Pf Pi and Pf 2 = PfcPi (since P 2 = P). Furthermore, 



P T 



pT p— 1 pT 
- r 12- r ll r I 



[ p i2 P22] - PI2P11 1 [Pn P12) 

= [0 -P 22 — -Pf2-fll 1 -f > 12] 

Hence, Equation ?? can be simplified to 

(y + Ya- u)jc = (P22 - P 1 T 2 Pf 1 1 Pi 2 )e/ c + AP 1 T 2 P n 1 sgn(nJ) 

We note that Pn, P12, -P22 can be expressed in terms of Y, Y\ and Y%. 

P n = I — Y\(Y T Y)~ l Yi 
P12 = -Y\(Y T Y)~ 1 Y 2 T 
P22 = I-Y 2 (Y T Y)- l Y? 

Moreover P^ 1 can be derived via matrix inversion lemma: 

p-i = {I- Yl {Y T Y)-^)^ 



Finally, we get 

AP 1 T 2 P n 1 sgn(^) 



(41) 



/ + Yi(Y T Y - Y?Y X )- X Y? = 1 + Y 1 (Y^Y 2 )- 1 Y^ 

= -y 2 {y t y)- x y?{i + yi(y 2 r y 2 )- 1 y 1 T )As g n(^) 
= -y 2 [{y t y)- 1 + (y T y)~ 1 y 1 T y 1 (y 2 T y 2 )- 1 ]y 1 T Asgn(nj) 
= -y 2 (y t yy x ^ 2 t y 2 + y 1 T y 1 ](y 2 T y 2 )- 1 y 1 T AsgnK) 

( = } -y 2 (y 2 T y 2 )- 1 y 1 T As g n(n}) 

where (a) follows from the fact that Y T Y = Y 2 Y 2 + Y^Y\ . And similarly by repeatedly using this 
fact we can find the following simplification 



(P22 - PuPnPu^ 



{I-Y^Yy^ejc 

-(Y^Yy^ii + Y^Y^Y^Y^r^ejc 
(/-y 2 (y 2 T y 2 )- 1 y 2 T ) e/ c 



□ 
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In order to justify the condition ??, we also need the following lemma. 
Lemma 9. The following three claims hold true: 

(i) w.p. at least 1 — p- (4/n + 2~ n ^), ||y 2 Te / c ll°o < 2 V ra log naa majX x majX ^/2p. 



(ii) w.p. at least 1 — 4p/n, \\Yj Xsgn(zj)\\ 00 < 2A||x||2\/log n 

(iii) w.p. at least 1 - 2^, A max ((ifYa) -1 ) < 2A max {(XTx 2 y 1 ) < ^ 

Proof. To prove (i), we try to bound the first component (Y^e/c) By definition, the first column 
of Y equals [0, y , ■ ■ ■ , y n -2\ T = [0, x , ■ ■ ■ , x n _ 2 ] T + [0, w , ■ ■ ■ , w; n _ 2 ] T . We also remember = 
Wi + J2 P j=i a i w i-j where Wi are i.i.d. Gaussian j\f(0, a 2 ). Hence, we have 



(Y 2 T e/c) 1 = ^ Xi-l(Wi + ^ a 3 W i-j) + ^2 W i~^ W i + ^2 a J W i-j) 
i£l c j=l i£l c j=l 

It is easy to check that the first term of RHS is zero-mean Gaussian random variable with 
variance < pa max £ max ^c 2 - It I s we H known that for standard Gaussian random variable t, Pr(|i| > 
a) < 2e~ a I 2 . So we conclude that with probability > 1 — 2/n 



i£l c j=l 



< aa 



max ^ max 



a/ 2pn log 



n 



It also can be proved that with probability > 1 — 2/n — 2 n / 5 



^ Wi-l{Wi + ^ a 3 w i-j) 
iel c j=l 



< 2pa max <7 2 v / ™log 



n 



We notice that cra raax _x raax _^2pn log n > 2pa max cj V n log n and hence claim (i) follows. 
Next, we prove claim (ii). Again, (Y 1 T Asgn(2;|)) 1 can be decomposed into two terms; 

(y 1 T Asgn(z})) 1 = (X 1 T Asgn(^)) 1 + ^ ^_ 1 Asgn(zJ) 

The first term is bounded from the assumption and the second term is Gaussian which is bounded 
by \o\j2n log n < A||x||2\/logn (assumption (3) in Subsection ??) w.p. > 1 — 2/n. 

For (iii), we only need to show that with high probability A m j n (Y^Y?) > |A m i n (XjA^), or 
Cmin (Y 2 ) > -^0-min (X2) where cx mm (A) denotes the smallest singular value of A. 

We denote the Gaussian noise matrix 



W 





W p -l 






w 



W n -2 



w 



n—p_ 
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and call W 2 as the submatrix that comprises of the rows of W indexed by 7 C . Then, we have 

(T m\ n (>?.) = min ||y 2 t|| 2 = min ||X 2 t + W^lh 
||*||2=1 ll*lla=l 

> min ||X 2 t|| 2 - max \\W 2 t\\ 2 = cr mill (X 2 ) - <r max (W 2 ) 

= 1 ||f||2 = l 

So the remaining work is to upper bound <7 max (W 2 ). A tight bound in this case is very difficult. 
However, the following bound is good enough for our proof. By denoting W 2 % as the i-th column 
of W 2 , we have 

G>"max(W 2 ) = max II W2* II 2 
P 2 = 1 



max i jY^{W 2 ,i,t)' 



where the second last inequality follows from Cauchy-Schwartz inequality. Then by the tail prob- 
ability of x 2 distribution, we have with probability 1 — 2~ n / 5 , 



o- max (W2) < y^Hll < V 2npa 2 
Then by applying assumption (1) in Subsection 11 we have proved the claim (hi). □ 
Finally, we can show that (u, a) satisfies the condition 11. 

Lemma 10. Equation 11 is satisfied with (u, a) given in Theorem 11 with probability at least 
1 - 8p/n - (p + l)2" n / 5 . 

Proof. From the tail probability of standard Gaussian Pr(|f| > a) < 2e _a2//2 , we know that with 
probability at least 1 — 2/n, maxj \wi\ < 2cr\/log n. Therefore the l 2 norm of all the rows of Y 2 
is upper bounded y/p^max + lay/log n) with probability at least 1 — 2/n. Combined with claim 
(hi) in Lemma 11, we know that the l 2 norm of all the rows of Y 2 (J 2 T Y 2 )- X is upper bounded 
|^j?(xmax + 2crVlog^) < 4c ^ max with probability at least 1 - 2/n - 2" n / 5 . 

Now we can verify that both -Y 2 (Y 2 T Y 2 y x Y^ 'Asgn(<u|) and (7 - Y 2 (Y 2 T Y 2 )~ 1 Y 2 T )e I c are small. 

First, based on claim (ii) in Lemma 11, with probability at least 1 — 2+ ^ P ~ 2~ n / 5 

II - >2(y 2 T y 2 )- 1 y 1 T Asgn(n})|| 00 < 4C ^f^ max • 2X\\x\\ 2 ^/l^ ■ ^ < A/3 

\\ x \\2 

where the last inequality follows from condition (3) in Subsection 11. 

Next, it is easy to bound He/cjloo < 2<7pa max \/logn < A/3 with probability at least 1 — 2/n. 
Also, we have with probability at least 1 — — (p + l)2~ n / 5 



11^2(^2 Y 2 )~ Y 2 e/ c ||oo < rj — 2Vnlognaa max x max V2p- jp < A/3 

II 3 - II 2 

where the last inequality follows from claim (i) of Lemma 11, condition (3) in Subsection 11 and 
the assumption A > 6apa max yTogn. □ 
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